Listen Top Shows Blog

“How Well Does RL Scale?” by Toby_Ord

“How Well Does RL Scale?” by Toby_Ord

Update: 2025-10-22

Share

Description

This is the latest in a series of essays on AI Scaling.
You can find the others on my site.

Summary: RL-training for LLMs scales surprisingly poorly. Most of its gains are from allowing LLMs to productively use longer chains of thought, allowing them to think longer about a problem. There is some improvement for a fixed length of answer, but not enough to drive AI progress. Given the scaling up of pre-training compute also stalled, we'll see less AI progress via compute scaling than you might have thought, and more of it will come from inference scaling (which has different effects on the world). That lengthens timelines and affects strategies for AI governance and safety.

The current era of improving AI capabilities using reinforcement learning (from verifiable rewards) involves two key types of scaling:

Scaling the amount of compute used for RL during training
Scaling [...]

---

Outline:

(09:46 ) How do these compare to pre-training scaling?

(14:16 ) Conclusion

---

First published:

October 22nd, 2025

Source:

https://www.lesswrong.com/posts/xpj6KhDM9bJybdnEe/how-well-does-rl-scale

---

Narrated by TYPE III AUDIO.

---

Images from the article:

Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

Comments

In Channel

“Postrationality: An Oral History” by Gordon Seidoh Worley

“Postrationality: An Oral History” by Gordon Seidoh Worley

2025-10-2249:05

[Linkpost] “Consider donating to AI safety champion Scott Wiener” by Eric Neyman

[Linkpost] “Consider donating to AI safety champion Scott Wiener” by Eric Neyman

2025-10-2202:36

“How Well Does RL Scale?” by Toby_Ord

“How Well Does RL Scale?” by Toby_Ord

2025-10-2216:12

“White House OSTP AI Deregulation Public Comment Period Ends Oct. 27” by Zack_M_Davis

“White House OSTP AI Deregulation Public Comment Period Ends Oct. 27” by Zack_M_Davis

2025-10-2201:26

“Is 90% of code at Anthropic being written by AIs?” by ryan_greenblatt

“Is 90% of code at Anthropic being written by AIs?” by ryan_greenblatt

2025-10-2212:57

“Stratified Utopia” by Cleo Nardo

“Stratified Utopia” by Cleo Nardo

2025-10-2230:11

“Remarks on Bayesian studies from 1963” by dynomight

“Remarks on Bayesian studies from 1963” by dynomight

2025-10-2202:58

[Linkpost] “21st Century Civilization curriculum” by Richard_Ngo

[Linkpost] “21st Century Civilization curriculum” by Richard_Ngo

2025-10-2102:40

“Symbiogenesis vs. Convergent Consequentialism” by Audrey Tang, plex

“Symbiogenesis vs. Convergent Consequentialism” by Audrey Tang, plex

2025-10-2139:20

“EU explained in 10 minutes” by Martin Sustrik

“EU explained in 10 minutes” by Martin Sustrik

2025-10-2116:48

“Can you find the steganographically hidden message?” by Kei Nishimura-Gasparian

“Can you find the steganographically hidden message?” by Kei Nishimura-Gasparian

2025-10-2015:12

[Linkpost] “How Stuart Buck funded the replication crisis” by Elizabeth

[Linkpost] “How Stuart Buck funded the replication crisis” by Elizabeth

2025-10-2000:41

“Consider donating to Alex Bores, author of the RAISE Act” by Eric Neyman

“Consider donating to Alex Bores, author of the RAISE Act” by Eric Neyman

2025-10-2050:29

“Considerations around career costs of political donations” by GradientDissenter

“Considerations around career costs of political donations” by GradientDissenter

2025-10-2027:27

“Bubble, Bubble, Toil and Trouble” by Zvi

“Bubble, Bubble, Toil and Trouble” by Zvi

2025-10-2032:41

“Scenes, cliques and teams - a high level ontology of groups” by Tobes

“Scenes, cliques and teams - a high level ontology of groups” by Tobes

2025-10-2012:47

“Frontier LLM Race/Sex Exchange Rates” by Arjun Panickssery

“Frontier LLM Race/Sex Exchange Rates” by Arjun Panickssery

2025-10-2006:49

“Humanity Learned Almost Nothing From COVID-19” by niplav

“Humanity Learned Almost Nothing From COVID-19” by niplav

2025-10-1908:46

“AI #138 Part 2: Watch Out For Documents” by Zvi

“AI #138 Part 2: Watch Out For Documents” by Zvi

2025-10-1901:25:10

“The Dark Arts of Tokenization or: How I learned to start worrying and love LLMs’ undecoded outputs” by Lovre

“The Dark Arts of Tokenization or: How I learned to start worrying and love LLMs’ undecoded outputs” by Lovre

2025-10-1947:16

00:00

00:00

x

“How Well Does RL Scale?” by Toby_Ord

“How Well Does RL Scale?” by Toby_Ord