Sample Complexity and Representation Ability of Test-time Scaling Paradigms

Update: 2025-09-09

Description

This paper explores theoretical foundations** for **test-time scaling paradigms** in large language models (LLMs). It **analyzes the sample efficiency** of repeated sampling methods like **self-consistency**, finding it requires more samples (Θ(1/∆²)) than **best-of-n** (Θ(1/∆)) for reliable answers. Furthermore, the paper **investigates the expressive power of self-correction**, demonstrating that Transformers with verifier feedback can simulate online learning, enabling a **single Transformer architecture to solve multiple tasks** without prior task knowledge. The authors **empirically validate their theoretical findings**, showing that self-correction significantly enhances accuracy, especially in larger models.

Comments

In Channel

Sample Complexity and Representation Ability of Test-time Scaling Paradigms

2025-09-0919:45

RL's Razor: Why Online RL Forgets Less

2025-09-0724:56

Why Language Models Hallucinate

2025-09-0617:40

ALFA: Aligning LLMs to Ask Good Questions A Case Study in Clinical Reasoning

2025-09-0616:12

Sample Efficient Preference Alignment in LLMs via Active Exploration

2025-09-0615:05

Adventures in Demand Analysis Using AI

2025-09-0413:59

Memento: Fine-tuning LLM Agents without Fine-tuning LLMs

2025-09-0118:59

On the Theoretical Limitations of Embedding-Based Retrieval

2025-08-3117:25

Performance Prediction for Large Systems via Text-to-Text Regression

2025-08-3015:53

Demystifying the Visual Quality Paradox in Multimodal Large Language Models

2025-08-3016:47

Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL

2025-08-3020:15

Compute-Optimal Scaling for Value-Based Deep RL

2025-08-2516:02

LLM-based Conversational Recommendation Agents with Collaborative Verbalized Experience

2025-08-2317:05

Signal and Noise: Evaluating Language Model Benchmarks

2025-08-2312:01

Breaking Feedback Loops in Recommender Systems with Causal Inference

2025-08-2112:54

RAG is Dead, Context Engineering is King: Building Reliable AI Systems

2025-08-2019:55

A Survey of Personalization: From RAG to Agent

2025-08-2025:00

Facilitating the Adoption of Causal Infer-ence Methods Through LLM-Empowered Co-Pilot

2025-08-1922:28

Performance Prediction for Large Systems via Text-to-Text Regression

2025-08-1619:09

Sample More to Think Less: Group Filtered Policy Optimization for Concise Reasoning

2025-08-1527:47

00:00

Sample Complexity and Representation Ability of Test-time Scaling Paradigms

#box-pro-ellipsis-175750949739427{-webkit-line-clamp:2;}Sample Complexity and Representation Ability of Test-time Scaling Paradigms

Sample Complexity and Representation Ability of Test-time Scaling Paradigms

Enoch H. Kang

Sample Complexity and Representation Ability of Test-time Scaling Paradigms