OpenAI's Noam Brown, Ilge Akkaya and Hunter Lightman on o1 and Teaching LLMs to Reason Better

Update: 2024-10-02

Description

Combining LLMs with AlphaGo-style deep reinforcement learning has been a holy grail for many leading AI labs, and with o1 (aka Strawberry) we are seeing the most general merging of the two modes to date. o1 is admittedly better at math than essay writing, but it has already achieved SOTA on a number of math, coding and reasoning benchmarks.

Deep RL legend and now OpenAI researcher Noam Brown and teammates Ilge Akkaya and Hunter Lightman discuss the ah-ha moments on the way to the release of o1, how it uses chains of thought and backtracking to think through problems, the discovery of strong test-time compute scaling laws and what to expect as the model gets better.

Hosted by: Sonya Huang and Pat Grady, Sequoia Capital

Mentioned in this episode:

Learning to Reason with LLMs: Technical report accompanying the launch of OpenAI o1.

Generator verifier gap: Concept Noam explains in terms of what kinds of problems benefit from more inference-time compute.

Agent57: Outperforming the human Atari benchmark, 2020 paper where DeepMind demonstrated “the first deep reinforcement learning agent to obtain a score that is above the human baseline on all 57 Atari 2600 games.”

Move 37: Pivotal move in AlphaGo’s second game against Lee Sedol where it made a move so surprising that Sedol thought it must be a mistake, and only later discovered he had lost the game to a superhuman move.

IOI competition: OpenAI entered o1 into the International Olympiad in Informatics and received a Silver Medal.

System 1, System 2: The thesis if Danial Khaneman’s pivotal book of behavioral economics, Thinking, Fast and Slow, that positied two distinct modes of thought, with System 1 being fast and instinctive and System 2 being slow and rational.

AlphaZero: The predecessor to AlphaGo which learned a variety of games completely from scratch through self-play. Interestingly, self-play doesn’t seem to have a role in o1.

Solving Rubik’s Cube with a robot hand: Early OpenAI robotics paper that Ilge Akkaya worked on.

The Last Question: Science fiction story by Isaac Asimov with interesting parallels to scaling inference-time compute.

Strawberry: Why?

O1-mini: A smaller, more efficient version of 1 for applications that require reasoning without broad world knowledge.

00:00 - Introduction

01:33 - Conviction in o1

04:24 - How o1 works

05:04 - What is reasoning?

07:02 - Lessons from gameplay

09:14 - Generation vs verification

10:31 - What is surprising about o1 so far

11:37 - The trough of disillusionment

14:03 - Applying deep RL

14:45 - o1’s AlphaGo moment?

17:38 - A-ha moments

21:10 - Why is o1 good at STEM?

24:10 - Capabilities vs usefulness

25:29 - Defining AGI

26:13 - The importance of reasoning

28:39 - Chain of thought

30:41 - Implication of inference-time scaling laws

35:10 - Bottlenecks to scaling test-time compute

38:46 - Biggest misunderstanding about o1?

41:13 - o1-mini

42:15 - How should founders think about o1?

Comments

Top Podcasts

The Best New Comedy Podcast Right Now – June 2024 The Best News Podcast Right Now – June 2024 The Best New Business Podcast Right Now – June 2024 The Best New Sports Podcast Right Now – June 2024 The Best New True Crime Podcast Right Now – June 2024 The Best New Joe Rogan Experience Podcast Right Now – June 20 The Best New Dan Bongino Show Podcast Right Now – June 20 The Best New Mark Levin Podcast – June 2024

In Channel

XBOW CEO and GitHub Copilot Creator Oege de Moor: Cracking the Code on Offensive Security With AI

2024-12-1051:37

Ramp CEO Eric Glyman: Using AI to Build “Self-Driving Money”

2024-12-0338:48

Dust’s Gabriel Hubert and Stanislas Polu: Getting the Most From AI With Multiple Custom Agents

2024-11-2601:03:07

Clay’s Kareem Amin on Building the Sales ‘System of Action’ with AI

2024-11-1951:38

Decart’s Dean Leitersdorf on AI-Generated Video Games and Worlds

2024-11-1346:34

How Glean CEO Arvind Jain Solved the Enterprise Search Problem – and What It Means for AI at Work

2024-10-2944:48

OpenAI Researcher Dan Roberts on What Physics Can Teach Us About AI

2024-10-2241:42

Google NotebookLM’s Raiza Martin and Jason Spielman on Creating Delightful AI Podcast Hosts and the Potential for Source-Grounded AI

2024-10-1532:07

Snowflake CEO Sridhar Ramaswamy on Using Data to Create Simple, Reliable AI for Businesses

2024-10-0859:29

OpenAI's Noam Brown, Ilge Akkaya and Hunter Lightman on o1 and Teaching LLMs to Reason Better

2024-10-0245:22

Why Vlad Tenev and Tudor Achim of Harmonic Think AI Is About to Change Math—and Why It Matters

2024-09-2439:45

Jim Fan on Nvidia’s Embodied AI Lab and Jensen Huang’s Prediction that All Robots will be Autonomous

2024-09-1749:13

Founder Eric Steinberger on Magic’s Counterintuitive Approach to Pursuing AGI

2024-09-1051:15

Crucible Moments Returns for S2: The ServiceNow Story ft. CEO Frank Slootman & Founder Fred Luddy

2024-09-0342:53

Sierra Co-Founder Clay Bavor on Making Customer-Facing AI Agents Delightful

2024-08-2701:12:31

Phaidra’s Jim Gao on Building the Fourth Industrial Revolution with Reinforcement Learning

2024-08-2050:33

Fireworks Founder Lin Qiao on How Fast Inference and Small Models Will Benefit Businesses

2024-08-1339:18

GitHub CEO Thomas Dohmke on Building Copilot, and the the Future of Software Development

2024-08-0601:07:34

Meta’s Joe Spisak on Llama 3.1 405B and the Democratization of Frontier Models

2024-07-3042:07

Klarna CEO Sebastian Siemiatkowski on Getting AI to Do the Work of 700 Customer Service Reps

2024-07-2351:35

00:00

OpenAI's Noam Brown, Ilge Akkaya and Hunter Lightman on o1 and Teaching LLMs to Reason Better

#box-pro-ellipsis-17352558259933{-webkit-line-clamp:2;}OpenAI's Noam Brown, Ilge Akkaya and Hunter Lightman on o1 and Teaching LLMs to Reason Better

OpenAI's Noam Brown, Ilge Akkaya and Hunter Lightman on o1 and Teaching LLMs to Reason Better

Sequoia Capital

OpenAI's Noam Brown, Ilge Akkaya and Hunter Lightman on o1 and Teaching LLMs to Reason Better