DiscoverBest AI papers explainedCompute-Optimal Scaling for Value-Based Deep RL
Compute-Optimal Scaling for Value-Based Deep RL

Compute-Optimal Scaling for Value-Based Deep RL

Update: 2025-08-25
Share

Description

This paper investigates compute-optimal scaling strategies for value-based deep reinforcement learning (RL), focusing on efficient resource allocation for neural network training. It examines the interplay between model size and batch size, identifying a unique phenomenon termed TD-overfitting where smaller models struggle with larger batch sizes due to evolving, lower-quality target values. The research proposes a prescriptive rule for optimal batch size selection that accounts for both model size and the updates-to-data (UTD) ratio, enabling better compute and data efficiency. Furthermore, the paper provides a framework for allocating computational resources (like UTD and model size) to achieve specific performance targets or maximize performance within a given budget, often demonstrating predictable power-law relationships for these scaling decisions.

Comments 
In Channel
loading
00:00
00:00
x

0.5x

0.8x

1.0x

1.25x

1.5x

2.0x

3.0x

Sleep Timer

Off

End of Episode

5 Minutes

10 Minutes

15 Minutes

30 Minutes

45 Minutes

60 Minutes

120 Minutes

Compute-Optimal Scaling for Value-Based Deep RL

Compute-Optimal Scaling for Value-Based Deep RL

Enoch H. Kang