Listen Top Shows Blog

Localizing and Editing Knowledge in LLMs with Peter Hase - #679

Localizing and Editing Knowledge in LLMs with Peter Hase - #679

Update: 2024-04-08

Share

Description

Today we're joined by Peter Hase, a fifth-year PhD student at the University of North Carolina NLP lab. We discuss "scalable oversight", and the importance of developing a deeper understanding of how large neural networks make decisions. We learn how matrices are probed by interpretability researchers, and explore the two schools of thought regarding how LLMs store knowledge. Finally, we discuss the importance of deleting sensitive information from model weights, and how "easy-to-hard generalization" could increase the risk of releasing open-source foundation models.

The complete show notes for this episode can be found at twimlai.com/go/679.

Comments

In Channel

Is It Time to Rethink LLM Pre-Training? with Aditi Raghunathan - #747

Is It Time to Rethink LLM Pre-Training? with Aditi Raghunathan - #747

2025-09-1616:34

Building an Immune System for AI Generated Software with Animesh Koratana - #746

Building an Immune System for AI Generated Software with Animesh Koratana - #746

2025-09-0901:04:41

Autoformalization and Verifiable Superintelligence with Christian Szegedy - #745

Autoformalization and Verifiable Superintelligence with Christian Szegedy - #745

2025-09-0201:11:18

Multimodal AI Models on Apple Silicon with MLX with Prince Canuma - #744

Multimodal AI Models on Apple Silicon with MLX with Prince Canuma - #744

2025-08-2601:09:50

Genie 3: A New Frontier for World Models with Jack Parker-Holder and Shlomi Fruchter - #743

Genie 3: A New Frontier for World Models with Jack Parker-Holder and Shlomi Fruchter - #743

2025-08-1901:00:31

Closing the Loop Between AI Training and Inference with Lin Qiao - #742

Closing the Loop Between AI Training and Inference with Lin Qiao - #742

2025-08-1201:00:40

Context Engineering for Productive AI Agents with Filip Kozera - #741

Context Engineering for Productive AI Agents with Filip Kozera - #741

2025-07-2945:31

Infrastructure Scaling and Compound AI Systems with Jared Quincy Davis - #740

Infrastructure Scaling and Compound AI Systems with Jared Quincy Davis - #740

2025-07-2201:12:32

Building Voice AI Agents That Don’t Suck with Kwindla Kramer - #739

Building Voice AI Agents That Don’t Suck with Kwindla Kramer - #739

2025-07-1501:12:32

Distilling Transformers and Diffusion Models for Robust Edge Use Cases with Fatih Porikli - #738

Distilling Transformers and Diffusion Models for Robust Edge Use Cases with Fatih Porikli - #738

2025-07-0901:00:30

Building the Internet of Agents with Vijoy Pandey - #737

Building the Internet of Agents with Vijoy Pandey - #737

2025-06-2456:31

LLMs for Equities Feature Forecasting at Two Sigma with Ben Wellington - #736

LLMs for Equities Feature Forecasting at Two Sigma with Ben Wellington - #736

2025-06-1759:01

Zero-Shot Auto-Labeling: The End of Annotation for Computer Vision with Jason Corso - #735

Zero-Shot Auto-Labeling: The End of Annotation for Computer Vision with Jason Corso - #735

2025-06-1057:01

Grokking, Generalization Collapse, and the Dynamics of Training Deep Neural Networks with Charles Martin - #734

Grokking, Generalization Collapse, and the Dynamics of Training Deep Neural Networks with Charles Martin - #734

2025-06-0501:25:37

Google I/O 2025 Special Edition - #733

Google I/O 2025 Special Edition - #733

2025-05-2826:37

RAG Risks: Why Retrieval-Augmented LLMs are Not Safer with Sebastian Gehrmann - #732

RAG Risks: Why Retrieval-Augmented LLMs are Not Safer with Sebastian Gehrmann - #732

2025-05-2157:37

From Prompts to Policies: How RL Builds Better AI Agents with Mahesh Sathiamoorthy - #731

From Prompts to Policies: How RL Builds Better AI Agents with Mahesh Sathiamoorthy - #731

2025-05-1301:01:27

How OpenAI Builds AI Agents That Think and Act with Josh Tobin - #730

How OpenAI Builds AI Agents That Think and Act with Josh Tobin - #730

2025-05-0601:06:57

CTIBench: Evaluating LLMs in Cyber Threat Intelligence with Nidhi Rastogi - #729

CTIBench: Evaluating LLMs in Cyber Threat Intelligence with Nidhi Rastogi - #729

2025-04-3055:48

Generative Benchmarking with Kelly Hong - #728

Generative Benchmarking with Kelly Hong - #728

2025-04-2353:47

00:00

00:00

x

Localizing and Editing Knowledge in LLMs with Peter Hase - #679

Localizing and Editing Knowledge in LLMs with Peter Hase - #679

Sam Charrington