“Understanding and Controlling LLM Generalization” by Daniel Tan

Update: 2025-11-15

Description

A distillation of my long-term research agenda and current thinking. I welcome takes on this.

Why study generalization?

I'm interested in studying how LLMs generalise - when presented with multiple policies that achieve similar loss, which ones tend to be learned by default?

I claim this is pretty important for AI safety:

Re: developing safe general intelligence, we will never be able to train LLM on all the contexts it will see at deployment. To prevent goal misgeneralization, it's necessary to understand how LLMs generalise their training OOD.
Re: loss of control risks specifically, certain important kinds of misalignment (reward hacking, scheming) are difficult to 'select against' at the behavioural level. A fallback for this would be if LLMs had an innate 'generalization propensity' to learn aligned policies over misaligned ones.

This motivates research into LLM inductive biases. Or as I'll call them from here on, 'generalization propensities'.

I have two high-level goals:

Understanding the complete set of causal factors that drive generalization.
Controlling generalization by intervening on these causal factors in a principled way.

Defining "generalization propensity"

To study generalization propensities, we need two things:

"Generalization propensity evaluations" (GPEs)
[...]

---

Outline:

(00:18 ) Why study generalization?

(01:30 ) Defining generalization propensity

(02:29 ) Research questions

---

First published:

November 14th, 2025

Source:

https://www.lesswrong.com/posts/ZSQaT2yxNNZ3eLxRd/understanding-and-controlling-llm-generalization

---

Narrated by TYPE III AUDIO.

Comments

In Channel

“Increasing marginal returns to effort are common” by habryka

2025-11-1513:15

“Generation Ship: A Protest Song For PauseAI” by LoganStrohl

2025-11-1502:57

“‘But You’d Like To Feel Companionate Love, Right? ... Right?’” by johnswentworth

2025-11-1505:18

“Understanding and Controlling LLM Generalization” by Daniel Tan

2025-11-1503:55

“AI Craziness: Additional Suicide Lawsuits and The Fate of GPT-4o” by Zvi

2025-11-1513:13

“AI Corrigibility Debate: Max Harms vs. Jeremy Gillen” by Liron, Max Harms, Jeremy Gillen

2025-11-1402:24:59

“10” by Ben Pace

2025-11-1407:55

“Everyone has a plan until they get lied to the face” by Screwtape

2025-11-1412:49

“The rare, deadly virus lurking in the Southwest US, and the bigger picture” by eukaryote

2025-11-1431:38

“Creditworthiness should not be for sale” by habryka

2025-11-1414:53

“Types of systems that could be useful for agent foundations” by Alex_Altair

2025-11-1409:14

“The Charge of the Hobby Horse” by TsviBT

2025-11-1410:51

“Two can keep a secret if one is dead. So please share everything with at least one person.” by habryka

2025-11-1403:57

“Why Truth First?” by johnswentworth

2025-11-1411:54

“Orient Speed in the 21st Century” by Raemon

2025-11-1405:57

“Tell people as early as possible it’s not going to work out” by habryka

2025-11-1403:20

“Epistemic Spot Check: Expected Value of Donating to Alex Bores’s Congressional Campaign” by MichaelDickens

2025-11-1411:50

“(Fantasy) -> (Planning): A Core Mental Move For Agentic Humans?” by johnswentworth

2025-11-1403:59

“Weight-sparse transformers have interpretable circuits” by leogao

2025-11-1302:48

“What’s so hard about...? A question worth asking” by Ruby

2025-11-1304:48

00:00

1.0x

“Understanding and Controlling LLM Generalization” by Daniel Tan

#box-pro-ellipsis-176322261714534{-webkit-line-clamp:2;}“Understanding and Controlling LLM Generalization” by Daniel Tan

“Understanding and Controlling LLM Generalization” by Daniel Tan

“Understanding and Controlling LLM Generalization” by Daniel Tan