“A Pragmatic Vision for Interpretability” by Neel Nanda
Update: 2025-12-08
Description
Executive Summary
Outline:
(00:10 ) Executive Summary
(03:00 ) Introduction
(03:44 ) Motivating Example: Steering Against Evaluation Awareness
(06:21 ) Our Core Process
(08:20 ) Which Beliefs Are Load-Bearing?
(10:25 ) Is This Really Mech Interp?
(11:27 ) Our Comparative Advantage
(14:57 ) Why Pivot?
(15:20 ) Whats Changed In AI?
(16:08 ) Reflections On The Fields Progress
(18:18 ) Task Focused: The Importance Of Proxy Tasks
(18:52 ) Case Study: Sparse Autoencoders
(21:35 ) Ensure They Are Good Proxies
(23:11 ) Proxy Tasks Can Be About Understanding
(24:49 ) Types Of Projects: What Drives Research Decisions
(25:18 ) Focused Projects
(28:31 ) Exploratory Projects
(28:35 ) Curiosity Is A Double-Edged Sword
(30:56 ) Starting In A Robustly Useful Setting
(34:45 ) Time-Boxing
(36:27 ) Worked Examples
(39:15 ) Blending The Two: Tentative Proxy Tasks
(41:23 ) What's Your Contribution?
(43:08 ) Jack Lindsey's Approach
(45:44 ) Method Minimalism
(46:12 ) Case Study: Shutdown Resistance
(48:28 ) Try The Easy Methods First
(50:02 ) When Should We Develop New Methods?
(51:36 ) Call To Action
(53:04 ) Acknowledgments
(54:02 ) Appendix: Common Objections
(54:08 ) Aren't You Optimizing For Quick Wins Over Breakthroughs?
(56:34 ) What If AGI Is Fundamentally Different?
(57:30 ) I Care About Scientific Beauty and Making AGI Go Well
(58:09 ) Is This Just Applied Interpretability?
(58:44 ) Are You Saying This Because You Need To Prove Yourself Useful To Google?
(59:10 ) Does This Really Apply To People Outside AGI Companies?
(59:40 ) Aren't You Just Giving Up?
(01:00:04 ) Is Ambitious Reverse-engineering Actually Overcrowded?
(01:00:48 ) Appendix: Defining Mechanistic Interpretability
(01:01:44 ) Moving Toward Mechanistic OR Interpretability
The original text contained 47 footnotes which were omitted from this narration.
---
First published:
December 1st, 2025
Source:
https://www.lesswrong.com/posts/StENzDcD3kpfGJssR/a-pragmatic-vision-for-inter
- The Google DeepMind mechanistic interpretability team has made a strategic pivot over the past year, from ambitious reverse-engineering to a focus on pragmatic interpretability:
- Trying to directly solve problems on the critical path to AGI going well[[1]]
- Carefully choosing problems according to our comparative advantage
- Measuring progress with empirical feedback on proxy tasks
- We believe that, on the margin, more researchers who share our goals should take a pragmatic approach to interpretability, both in industry and academia, and we call on people to join us
- Our proposed scope is broad and includes much non-mech interp work, but we see this as the natural approach for mech interp researchers to have impact
- Specifically, we’ve found that the skills, tools and tastes of mech interp researchers transfer well to important and neglected problems outside “classic” mech interp
- See our companion piece for more on which research areas and theories of change we think are promising
- Why pivot now? We think that times have changed.
- Models are far more capable, bringing new questions within empirical reach
- We have been [...]
Outline:
(00:10 ) Executive Summary
(03:00 ) Introduction
(03:44 ) Motivating Example: Steering Against Evaluation Awareness
(06:21 ) Our Core Process
(08:20 ) Which Beliefs Are Load-Bearing?
(10:25 ) Is This Really Mech Interp?
(11:27 ) Our Comparative Advantage
(14:57 ) Why Pivot?
(15:20 ) Whats Changed In AI?
(16:08 ) Reflections On The Fields Progress
(18:18 ) Task Focused: The Importance Of Proxy Tasks
(18:52 ) Case Study: Sparse Autoencoders
(21:35 ) Ensure They Are Good Proxies
(23:11 ) Proxy Tasks Can Be About Understanding
(24:49 ) Types Of Projects: What Drives Research Decisions
(25:18 ) Focused Projects
(28:31 ) Exploratory Projects
(28:35 ) Curiosity Is A Double-Edged Sword
(30:56 ) Starting In A Robustly Useful Setting
(34:45 ) Time-Boxing
(36:27 ) Worked Examples
(39:15 ) Blending The Two: Tentative Proxy Tasks
(41:23 ) What's Your Contribution?
(43:08 ) Jack Lindsey's Approach
(45:44 ) Method Minimalism
(46:12 ) Case Study: Shutdown Resistance
(48:28 ) Try The Easy Methods First
(50:02 ) When Should We Develop New Methods?
(51:36 ) Call To Action
(53:04 ) Acknowledgments
(54:02 ) Appendix: Common Objections
(54:08 ) Aren't You Optimizing For Quick Wins Over Breakthroughs?
(56:34 ) What If AGI Is Fundamentally Different?
(57:30 ) I Care About Scientific Beauty and Making AGI Go Well
(58:09 ) Is This Just Applied Interpretability?
(58:44 ) Are You Saying This Because You Need To Prove Yourself Useful To Google?
(59:10 ) Does This Really Apply To People Outside AGI Companies?
(59:40 ) Aren't You Just Giving Up?
(01:00:04 ) Is Ambitious Reverse-engineering Actually Overcrowded?
(01:00:48 ) Appendix: Defining Mechanistic Interpretability
(01:01:44 ) Moving Toward Mechanistic OR Interpretability
The original text contained 47 footnotes which were omitted from this narration.
---
First published:
December 1st, 2025
Source:
https://www.lesswrong.com/posts/StENzDcD3kpfGJssR/a-pragmatic-vision-for-inter
Comments
In Channel



