Learning When to Plan: Efficiently Allocating Test-Time Compute for LLM Agents

Update: 2025-09-08

Description

In this episode, we discuss Learning When to Plan: Efficiently Allocating Test-Time Compute for LLM Agents by Davide Paglieri, Bartłomiej Cupiał, Jonathan Cook, Ulyana Piterbarg, Jens Tuyls, Edward Grefenstette, Jakob Nicolaus Foerster, Jack Parker-Holder, Tim Rocktäschel. The paper introduces a framework enabling large language model agents to dynamically decide when to plan during task execution, improving efficiency and performance. They propose a two-stage training process combining supervised fine-tuning and reinforcement learning to develop this capability. Experiments show these dynamically planning agents are more sample-efficient, achieve complex goals better, and can be guided by human plans.

Comments

In Channel

The Markovian Thinker

2025-10-1607:48

DeepDive: Advancing Deep Search Agents with Knowledge Graphs and Multi-Turn RL

2025-10-0808:03

Towards a Physics Foundation Model

2025-10-0307:04

Scalable Option Learning in High-Throughput Environments

2025-09-3008:18

Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning

2025-09-2408:10

Reverse-Engineered Reasoning for Open-Ended Generation

2025-09-1908:39

Scaling Performance of Large Language Model Pretraining

2025-09-1606:58

General Social Agents

2025-09-1508:30

We need a new ethics for a world of AI agents

2025-09-1207:26

Hierarchical Reasoning Model

2025-09-1109:03

ARC-Hunyuan-Video-7B: Structured Video Comprehension of Real-World Shorts

2025-09-1008:23

Small Language Models are the Future of Agentic AI

2025-09-0907:54

Learning When to Plan: Efficiently Allocating Test-Time Compute for LLM Agents

2025-09-0807:01

Why Language Models Hallucinate

2025-09-0707:52

Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens

2025-08-1907:17

Learning from Reward-Free Offline Data: A Case for Planning with Latent Dynamics Models

2025-08-1508:18

Persona Vectors: Monitoring and Controlling Character Traits in Language Models

2025-08-1309:10

Learn Globally, Speak Locally: Bridging the Gaps in Multilingual Reasoning

2025-08-0108:48

Position: The AI Conference Peer Review Crisis Demands Author Feedback and Reviewer Rewards

2025-07-3108:33

Working with AI: Measuring the Occupational Implications of Generative AI

2025-07-3108:04

00:00

Learning When to Plan: Efficiently Allocating Test-Time Compute for LLM Agents

#box-pro-ellipsis-176099264192349{-webkit-line-clamp:2;}Learning When to Plan: Efficiently Allocating Test-Time Compute for LLM Agents

Learning When to Plan: Efficiently Allocating Test-Time Compute for LLM Agents

agibreakdown

Learning When to Plan: Efficiently Allocating Test-Time Compute for LLM Agents