Learning When to Plan: Efficiently Allocating Test-Time Compute for LLM Agents
Update: 2025-09-08
Description
In this episode, we discuss Learning When to Plan: Efficiently Allocating Test-Time Compute for LLM Agents by Davide Paglieri, Bartłomiej Cupiał, Jonathan Cook, Ulyana Piterbarg, Jens Tuyls, Edward Grefenstette, Jakob Nicolaus Foerster, Jack Parker-Holder, Tim Rocktäschel. The paper introduces a framework enabling large language model agents to dynamically decide when to plan during task execution, improving efficiency and performance. They propose a two-stage training process combining supervised fine-tuning and reinforcement learning to develop this capability. Experiments show these dynamically planning agents are more sample-efficient, achieve complex goals better, and can be guided by human plans.
Comments
In Channel