Arxiv Papers

[QA] On the Theoretical Limitations of Embedding-Based Retrieval

2025-09-0108:55

https://arxiv.org/abs//2508.21038YouTube: https://www.youtube.com/@ArxivPapersTikTok: https://www.tiktok.com/@arxiv_papersApple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers

On the Theoretical Limitations of Embedding-Based Retrieval

2025-09-0123:17

https://arxiv.org/abs//2508.21038YouTube: https://www.youtube.com/@ArxivPapersTikTok: https://www.tiktok.com/@arxiv_papersApple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers

[QA] Beyond GPT-5: Making LLMs Cheaper and Better via Performance–Efficiency Optimized Routing

2025-08-2207:03

Avengers-Pro is a test-time routing framework that optimizes performance and efficiency in LLMs, achieving state-of-the-art results by dynamically assigning queries to suitable models based on performance-efficiency scores.https://arxiv.org/abs//2508.12631YouTube: https://www.youtube.com/@ArxivPapersTikTok: https://www.tiktok.com/@arxiv_papersApple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers

Beyond GPT-5: Making LLMs Cheaper and Better via Performance–Efficiency Optimized Routing

2025-08-2209:39

Avengers-Pro is a test-time routing framework that optimizes performance and efficiency in LLMs, achieving state-of-the-art results by dynamically assigning queries to suitable models based on performance-efficiency scores.https://arxiv.org/abs//2508.12631YouTube: https://www.youtube.com/@ArxivPapersTikTok: https://www.tiktok.com/@arxiv_papersApple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers

[QA] Measuring the environmental impact of delivering AI at Google Scale

2025-08-2208:17

https://arxiv.org/abs//2508.15734YouTube: https://www.youtube.com/@ArxivPapersTikTok: https://www.tiktok.com/@arxiv_papersApple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers

Measuring the environmental impact of delivering AI at Google Scale

2025-08-2222:09

https://arxiv.org/abs//2508.15734YouTube: https://www.youtube.com/@ArxivPapersTikTok: https://www.tiktok.com/@arxiv_papersApple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers

[QA] Deep Think with Confidence

2025-08-2207:36

DeepConf enhances reasoning efficiency and performance in Large Language Models by filtering low-quality traces using internal confidence signals, achieving high accuracy and reduced token generation without extra training.https://arxiv.org/abs//2508.15260YouTube: https://www.youtube.com/@ArxivPapersTikTok: https://www.tiktok.com/@arxiv_papersApple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers

Deep Think with Confidence

2025-08-2218:34

DeepConf enhances reasoning efficiency and performance in Large Language Models by filtering low-quality traces using internal confidence signals, achieving high accuracy and reduced token generation without extra training.https://arxiv.org/abs//2508.15260YouTube: https://www.youtube.com/@ArxivPapersTikTok: https://www.tiktok.com/@arxiv_papersApple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers

[QA] Intern-S1: A Scientific Multimodal Foundation Model

2025-08-2208:33

Intern-S1 is a multimodal model that excels in scientific tasks, outperforming both open-source and closed-source models, and aims to bridge the gap in high-value scientific research.https://arxiv.org/abs//2508.15763YouTube: https://www.youtube.com/@ArxivPapersTikTok: https://www.tiktok.com/@arxiv_papersApple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers

Intern-S1: A Scientific Multimodal Foundation Model

2025-08-2249:42

Intern-S1 is a multimodal model that excels in scientific tasks, outperforming both open-source and closed-source models, and aims to bridge the gap in high-value scientific research.https://arxiv.org/abs//2508.15763YouTube: https://www.youtube.com/@ArxivPapersTikTok: https://www.tiktok.com/@arxiv_papersApple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers

[QA] Search-Time Data Contamination

2025-08-2007:02

The paper identifies search-time contamination (STC) in evaluating search-based LLM agents, revealing how data leaks compromise benchmark integrity and proposing best practices for trustworthy evaluations.https://arxiv.org/abs//2508.13180YouTube: https://www.youtube.com/@ArxivPapersTikTok: https://www.tiktok.com/@arxiv_papersApple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers

Search-Time Data Contamination

2025-08-2019:34

The paper identifies search-time contamination (STC) in evaluating search-based LLM agents, revealing how data leaks compromise benchmark integrity and proposing best practices for trustworthy evaluations.https://arxiv.org/abs//2508.13180YouTube: https://www.youtube.com/@ArxivPapersTikTok: https://www.tiktok.com/@arxiv_papersApple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers

[QA] Thyme: Think Beyond Images

2025-08-1907:20

This paper introduces Thyme, a multimodal model enhancing image manipulation and reasoning through executable code, achieving significant performance improvements in perception and reasoning tasks via innovative training strategies.https://arxiv.org/abs//2508.11630YouTube: https://www.youtube.com/@ArxivPapersTikTok: https://www.tiktok.com/@arxiv_papersApple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers

Thyme: Think Beyond Images

2025-08-1925:37

This paper introduces Thyme, a multimodal model enhancing image manipulation and reasoning through executable code, achieving significant performance improvements in perception and reasoning tasks via innovative training strategies.https://arxiv.org/abs//2508.11630YouTube: https://www.youtube.com/@ArxivPapersTikTok: https://www.tiktok.com/@arxiv_papersApple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers

[QA] SSRL: Self-Search Reinforcement Learning

2025-08-1907:39

The paper explores using large language models as efficient simulators for reinforcement learning tasks, introducing Self-Search RL to enhance internal knowledge utilization and reduce reliance on external search engines.https://arxiv.org/abs//2508.10874YouTube: https://www.youtube.com/@ArxivPapersTikTok: https://www.tiktok.com/@arxiv_papersApple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers

SSRL: Self-Search Reinforcement Learning

2025-08-1932:32

The paper explores using large language models as efficient simulators for reinforcement learning tasks, introducing Self-Search RL to enhance internal knowledge utilization and reduce reliance on external search engines.https://arxiv.org/abs//2508.10874YouTube: https://www.youtube.com/@ArxivPapersTikTok: https://www.tiktok.com/@arxiv_papersApple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers

[QA] Deep Ignorance: Filtering Pretraining Data Builds Tamper-Resistant Safeguards into Open-Weight LLMs

2025-08-1407:19

This paper explores filtering dual-use topics from training data to enhance the tamper-resistance of open-weight AI systems, demonstrating significant improvements in adversarial fine-tuning resistance without degrading unrelated capabilities.https://arxiv.org/abs//2508.06601YouTube: https://www.youtube.com/@ArxivPapersTikTok: https://www.tiktok.com/@arxiv_papersApple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers

Deep Ignorance: Filtering Pretraining Data Builds Tamper-Resistant Safeguards into Open-Weight LLMs

2025-08-1431:24

This paper explores filtering dual-use topics from training data to enhance the tamper-resistance of open-weight AI systems, demonstrating significant improvements in adversarial fine-tuning resistance without degrading unrelated capabilities.https://arxiv.org/abs//2508.06601YouTube: https://www.youtube.com/@ArxivPapersTikTok: https://www.tiktok.com/@arxiv_papersApple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers

[QA] Beyond Ten Turns: Unlocking Long-Horizon Agentic Search with Large-Scale Asynchronous RL

2025-08-1407:42

This paper presents ASearcher, an open-source project enhancing search agents' capabilities through scalable RL training, achieving significant performance improvements in complex query handling and long-horizon search tasks.https://arxiv.org/abs//2508.07976YouTube: https://www.youtube.com/@ArxivPapersTikTok: https://www.tiktok.com/@arxiv_papersApple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers

Beyond Ten Turns: Unlocking Long-Horizon Agentic Search with Large-Scale Asynchronous RL

2025-08-1428:28

This paper presents ASearcher, an open-source project enhancing search agents' capabilities through scalable RL training, achieving significant performance improvements in complex query handling and long-horizon search tasks.https://arxiv.org/abs//2508.07976YouTube: https://www.youtube.com/@ArxivPapersTikTok: https://www.tiktok.com/@arxiv_papersApple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers

#box-pro-ellipsis-175925854637734{-webkit-line-clamp:2;}Arxiv Papers