Reflection AI’s Misha Laskin on the AlphaGo Moment for LLMs

Update: 2024-07-16

Description

LLMs are democratizing digital intelligence, but we’re all waiting for AI agents to take this to the next level by planning tasks and executing actions to actually transform the way we work and live our lives.

Yet despite incredible hype around AI agents, we’re still far from that “tipping point” with best in class models today. As one measure: coding agents are now scoring in the high-teens % on the SWE-bench benchmark for resolving GitHub issues, which far exceeds the previous unassisted baseline of 2% and the assisted baseline of 5%, but we’ve still got a long way to go.

Why is that? What do we need to truly unlock agentic capability for LLMs? What can we learn from researchers who have built both the most powerful agents in the world, like AlphaGo, and the most powerful LLMs in the world?

To find out, we’re talking to Misha Laskin, former research scientist at DeepMind. Misha is embarking on his vision to build the best agent models by bringing the search capabilities of RL together with LLMs at his new company, Reflection AI. He and his cofounder Ioannis Antonoglou, co-creator of AlphaGo and AlphaZero and RLHF lead for Gemini, are leveraging their unique insights to train the most reliable models for developers building agentic workflows.

Hosted by: Stephanie Zhan and Sonya Huang, Sequoia Capital

00:00 Introduction

01:11 Leaving Russia, discovering science

10:01 Getting into AI with Ioannis Antonoglou

15:54 Reflection AI and agents

25:41 The current state of Ai agents

29:17 AlphaGo, AlphaZero and Gemini

32:58 LLMs don’t have a ground truth reward

37:53 The importance of post-training

44:12 Task categories for agents

45:54 Attracting talent

50:52 How far away are capable agents?

56:01 Lightning round

Mentioned:

The Feynman Lectures on Physics: The classic text that got Misha interested in science.

Mastering the game of Go with deep neural networks and tree search: The original 2016 AlphaGo paper.

Mastering the game of Go without human knowledge: 2017 AlphaGo Zero paper

Scaling Laws for Reward Model Overoptimization: OpenAI paper on how reward models can be gamed at all scales for all algorithms.

Mapping the Mind of a Large Language Model: Article about Anthropic mechanistic interpretability paper that identifies how millions of concepts are represented inside Claude Sonnet

Pieter Abeel: Berkeley professor and founder of Covariant who Misha studied with

A2C and A3C: Advantage Actor Critic and Asynchronous Advantage Actor Critic, the two algorithms developed by Misha’s manager at DeepMind, Volodymyr Mnih, that defined reinforcement learning and deep reinforcement learning

Comments

Top Podcasts

The Best New Comedy Podcast Right Now – June 2024 The Best News Podcast Right Now – June 2024 The Best New Business Podcast Right Now – June 2024 The Best New Sports Podcast Right Now – June 2024 The Best New True Crime Podcast Right Now – June 2024 The Best New Joe Rogan Experience Podcast Right Now – June 20 The Best New Dan Bongino Show Podcast Right Now – June 20 The Best New Mark Levin Podcast – June 2024

In Channel

XBOW CEO and GitHub Copilot Creator Oege de Moor: Cracking the Code on Offensive Security With AI

2024-12-1051:37

Ramp CEO Eric Glyman: Using AI to Build “Self-Driving Money”

2024-12-0338:48

Dust’s Gabriel Hubert and Stanislas Polu: Getting the Most From AI With Multiple Custom Agents

2024-11-2601:03:07

Clay’s Kareem Amin on Building the Sales ‘System of Action’ with AI

2024-11-1951:38

Decart’s Dean Leitersdorf on AI-Generated Video Games and Worlds

2024-11-1346:34

How Glean CEO Arvind Jain Solved the Enterprise Search Problem – and What It Means for AI at Work

2024-10-2944:48

OpenAI Researcher Dan Roberts on What Physics Can Teach Us About AI

2024-10-2241:42

Google NotebookLM’s Raiza Martin and Jason Spielman on Creating Delightful AI Podcast Hosts and the Potential for Source-Grounded AI

2024-10-1532:07

Snowflake CEO Sridhar Ramaswamy on Using Data to Create Simple, Reliable AI for Businesses

2024-10-0859:29

OpenAI's Noam Brown, Ilge Akkaya and Hunter Lightman on o1 and Teaching LLMs to Reason Better

2024-10-0245:22

Why Vlad Tenev and Tudor Achim of Harmonic Think AI Is About to Change Math—and Why It Matters

2024-09-2439:45

Jim Fan on Nvidia’s Embodied AI Lab and Jensen Huang’s Prediction that All Robots will be Autonomous

2024-09-1749:13

Founder Eric Steinberger on Magic’s Counterintuitive Approach to Pursuing AGI

2024-09-1051:15

Crucible Moments Returns for S2: The ServiceNow Story ft. CEO Frank Slootman & Founder Fred Luddy

2024-09-0342:53

Sierra Co-Founder Clay Bavor on Making Customer-Facing AI Agents Delightful

2024-08-2701:12:31

Phaidra’s Jim Gao on Building the Fourth Industrial Revolution with Reinforcement Learning

2024-08-2050:33

Fireworks Founder Lin Qiao on How Fast Inference and Small Models Will Benefit Businesses

2024-08-1339:18

GitHub CEO Thomas Dohmke on Building Copilot, and the the Future of Software Development

2024-08-0601:07:34

Meta’s Joe Spisak on Llama 3.1 405B and the Democratization of Frontier Models

2024-07-3042:07

Klarna CEO Sebastian Siemiatkowski on Getting AI to Do the Work of 700 Customer Service Reps

2024-07-2351:35

00:00

Reflection AI’s Misha Laskin on the AlphaGo Moment for LLMs

#box-pro-ellipsis-173525773874366{-webkit-line-clamp:2;}Reflection AI’s Misha Laskin on the AlphaGo Moment for LLMs

Reflection AI’s Misha Laskin on the AlphaGo Moment for LLMs

Sequoia Capital

Reflection AI’s Misha Laskin on the AlphaGo Moment for LLMs