Compute-Optimal Scaling for Value-Based Deep RL

Update: 2025-08-25

Description

This paper investigates compute-optimal scaling strategies for value-based deep reinforcement learning (RL), focusing on efficient resource allocation for neural network training. It examines the interplay between model size and batch size, identifying a unique phenomenon termed TD-overfitting where smaller models struggle with larger batch sizes due to evolving, lower-quality target values. The research proposes a prescriptive rule for optimal batch size selection that accounts for both model size and the updates-to-data (UTD) ratio, enabling better compute and data efficiency. Furthermore, the paper provides a framework for allocating computational resources (like UTD and model size) to achieve specific performance targets or maximize performance within a given budget, often demonstrating predictable power-law relationships for these scaling decisions.

Comments

In Channel

RL's Razor: Why Online RL Forgets Less

2025-09-0724:56

Why Language Models Hallucinate

2025-09-0617:40

ALFA: Aligning LLMs to Ask Good Questions A Case Study in Clinical Reasoning

2025-09-0616:12

Sample Efficient Preference Alignment in LLMs via Active Exploration

2025-09-0615:05

Adventures in Demand Analysis Using AI

2025-09-0413:59

Memento: Fine-tuning LLM Agents without Fine-tuning LLMs

2025-09-0118:59

On the Theoretical Limitations of Embedding-Based Retrieval

2025-08-3117:25

Performance Prediction for Large Systems via Text-to-Text Regression

2025-08-3015:53

Demystifying the Visual Quality Paradox in Multimodal Large Language Models

2025-08-3016:47

Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL

2025-08-3020:15

Compute-Optimal Scaling for Value-Based Deep RL

2025-08-2516:02

LLM-based Conversational Recommendation Agents with Collaborative Verbalized Experience

2025-08-2317:05

Signal and Noise: Evaluating Language Model Benchmarks

2025-08-2312:01

Breaking Feedback Loops in Recommender Systems with Causal Inference

2025-08-2112:54

RAG is Dead, Context Engineering is King: Building Reliable AI Systems

2025-08-2019:55

A Survey of Personalization: From RAG to Agent

2025-08-2025:00

Facilitating the Adoption of Causal Infer-ence Methods Through LLM-Empowered Co-Pilot

2025-08-1922:28

Performance Prediction for Large Systems via Text-to-Text Regression

2025-08-1619:09

Sample More to Think Less: Group Filtered Policy Optimization for Concise Reasoning

2025-08-1527:47

DINOv3: Vision Models for Self-Supervised Learning

2025-08-1520:07

00:00

Compute-Optimal Scaling for Value-Based Deep RL

#box-pro-ellipsis-175760024628630{-webkit-line-clamp:2;}Compute-Optimal Scaling for Value-Based Deep RL

Compute-Optimal Scaling for Value-Based Deep RL

Enoch H. Kang

Compute-Optimal Scaling for Value-Based Deep RL