Diffusion LLMs - The Fastest LLMs Ever Built | Stefano Ermon, cofounder of Inception Labs
Description
Stefano Ermon is the cofounder of Inception Labs and an associate professor at Stanford. Inception is developing a new type of AI models called Diffusion LLMs.
Stefano's favorite book: If on a Winter's Night a Traveler (Author: Italo Calvino)
(00:01 ) Introduction
(00:38 ) What are autoregressive LLMs and how do they work
(02:28 ) How diffusion LLMs rethink generation
(04:02 ) The ceiling of autoregressive LLMs: cost, latency, reliability
(06:19 ) Why diffusion LLMs are commercially viable now
(09:12 ) Parallel refinement: how diffusion models generate text
(12:05 ) Understanding diffusion steps and efficiency
(13:49 ) Hardest engineering challenges at Inception
(15:23 ) From research to production: the power of data
(16:24 ) Where diffusion LLMs still lag behind
(18:18 ) Evaluations and benchmarks for diffusion LLMs
(20:20 ) Developer experience and OpenAI-compatible API
(21:47 ) Economics and GPU efficiency
(23:38 ) Hardware and runtime stack
(24:58 ) Competition and the evolving diffusion LLM landscape
(27:01 ) Where diffusion will win first — coding and agentic systems
(30:13 ) How diffusion changes infra, serving, and hardware design
(33:04 ) What’s next at Inception: reasoning and multimodality
(35:20 ) Rapid Fire Round
--------
Where to find Stefano Ermon:
LinkedIn: https://www.linkedin.com/in/ermon/
--------
Where to find Prateek Joshi:
Research column: https://www.infrastartups.com
Newsletter: https://prateekjoshi.substack.com
Website: https://prateekj.com
LinkedIn: https://www.linkedin.com/in/prateek-joshi-infinite
X: https://x.com/prateekvjoshi







