The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

779 Episodes

Reverse

The Evolution of Reasoning in Small Language Models with Yejin Choi - #761

2026-01-2901:05:51

Today, we're joined by Yejin Choi, professor and senior fellow at Stanford University in the Computer Science Department and the Institute for Human-Centered AI (HAI). In this conversation, we explore Yejin’s recent work on making small language models reason more effectively. We discuss how high-quality, diverse data plays a central role in closing the intelligence gap between small and large models, and how combining synthetic data generation, imitation learning, and reinforcement learning can unlock stronger reasoning capabilities in smaller models. Yejin explains the risks of homogeneity in model outputs and mode collapse highlighted in her “Artificial Hivemind” paper, and its impacts on human creativity and knowledge. We also discuss her team's novel approaches, including reinforcement learning as a pre-training objective, where models are incentivized to “think” before predicting the next token, and "Prismatic Synthesis," a gradient-based method for generating diverse synthetic math data while filtering overrepresented examples. Additionally, we cover the societal implications of AI and the concept of pluralistic alignment—ensuring AI reflects the diverse norms and values of humanity. Finally, Yejin shares her mission to democratize AI beyond large organizations and offers her predictions for the coming year. The complete show notes for this episode can be found at https://twimlai.com/go/761.

Intelligent Robots in 2026: Are We There Yet? with Nikita Rudin - #760

2026-01-0801:06:071

Today, we're joined by Nikita Rudin, co-founder and CEO of Flexion Robotics to discuss the gap between current robotic capabilities and what’s required to deploy fully autonomous robots in the real world. Nikita explains how reinforcement learning and simulation have driven rapid progress in robot locomotion—and why locomotion is still far from “solved.” We dig into the sim2real gap, and how adding visual inputs introduces noise and significantly complicates sim-to-real transfer. We also explore the debate between end-to-end models and modular approaches, and why separating locomotion, planning, and semantics remains a pragmatic approach today. Nikita also introduces the concept of "real-to-sim", which uses real-world data to refine simulation parameters for higher fidelity training, discusses how reinforcement learning, imitation learning, and teleoperation data are combined to train robust policies for both quadruped and humanoid robots, and introduces Flexion's hierarchical approach that utilizes pre-trained Vision-Language Models (VLMs) for high-level task orchestration with Vision-Language-Action (VLA) models and low-level whole-body trackers. Finally, Nikita shares the behind-the-scenes in humanoid robot demos, his take on reinforcement learning in simulation versus the real world, the nuances of reward tuning, and offers practical advice for researchers and practitioners looking to get started in robotics today. The complete show notes for this episode can be found at https://twimlai.com/go/760.

Rethinking Pre-Training for Agentic AI with Aakanksha Chowdhery - #759

2025-12-1753:00

Today, we're joined by Aakanksha Chowdhery, member of technical staff at Reflection, to explore the fundamental shifts required to build true agentic AI. While the industry has largely focused on post-training techniques to improve reasoning, Aakanksha draws on her experience leading pre-training efforts for Google’s PaLM and early Gemini models to argue that pre-training itself must be rethought to move beyond static benchmarks. We explore the limitations of next-token prediction for multi-step workflows and examine how attention mechanisms, loss objectives, and training data must evolve to support long-form reasoning and planning. Aakanksha shares insights on the difference between context retrieval and actual reasoning, the importance of "trajectory" training data, and why scaling remains essential for discovering emergent agentic capabilities like error recovery and dynamic tool learning. The complete show notes for this episode can be found at https://twimlai.com/go/759.

Why Vision Language Models Ignore What They See with Munawar Hayat - #758

2025-12-0957:401

In this episode, we’re joined by Munawar Hayat, researcher at Qualcomm AI Research, to discuss a series of papers presented at NeurIPS 2025 focusing on multimodal and generative AI. We dive into the persistent challenge of object hallucination in Vision-Language Models (VLMs), why models often discard visual information in favor of pre-trained language priors, and how his team used attention-guided alignment to enforce better visual grounding. We also explore a novel approach to generalized contrastive learning designed to solve complex, composed retrieval tasks—such as searching via combined text and image queries—without increasing inference costs. Finally, we cover the difficulties generative models face when rendering multiple human subjects, and the new "MultiHuman Testbench" his team created to measure and mitigate issues like identity leakage and attribute blending. Throughout the discussion, we examine how these innovations align with the need for efficient, on-device AI deployment. The complete show notes for this episode can be found at https://twimlai.com/go/758.

Scaling Agentic Inference Across Heterogeneous Compute with Zain Asgar - #757

2025-12-0248:27

In this episode, Zain Asgar, co-founder and CEO of Gimlet Labs, joins us to discuss the heterogeneous AI inference across diverse hardware. Zain argues that the current industry standard of running all AI workloads on high-end GPUs is unsustainable for agents, which consume significantly more tokens than traditional LLM applications. We explore Gimlet’s approach to heterogeneous inference, which involves disaggregating workloads across a mix of hardware—from H100s to older GPUs and CPUs—to optimize unit economics without sacrificing performance. We dive into their "three-layer cake" architecture: workload disaggregation, a compilation layer that maps models to specific hardware targets, and a novel system that uses LLMs to autonomously rewrite and optimize compute kernels. Finally, we discuss the complexities of networking in heterogeneous environments, the trade-offs between numerical precision and application accuracy, and the future of hardware-aware scheduling. The complete show notes for this episode can be found at https://twimlai.com/go/757.

Proactive Agents for the Web with Devi Parikh - #756

2025-11-1955:48

Today, we're joined by Devi Parikh, co-founder and co-CEO of Yutori, to discuss browser use models and a future where we interact with the web through proactive, autonomous agents. We explore the technical challenges of creating reliable web agents, the advantages of visually-grounded models that operate on screenshots rather than the browser’s more brittle document object model, or DOM, and why this counterintuitive choice has proven far more robust and generalizable for handling complex web interfaces. Devi also shares insights into Yutori’s training pipeline, which has evolved from supervised fine-tuning to include rejection sampling and reinforcement learning. Finally, we discuss how Yutori’s “Scouts” agents orchestrate multiple tools and sub-agents to handle complex queries, the importance of background, "ambient" operation for these systems, and what the path looks like from simple monitoring to full task automation on the web. The complete show notes for this episode can be found at https://twimlai.com/go/756.

AI Orchestration for Smart Cities and the Enterprise with Robin Braun and Luke Norris - #755

2025-11-1254:59

Today, we're joined by Robin Braun, VP of AI business development for hybrid cloud at HPE, and Luke Norris, co-founder and CEO of Kamiwaza, to discuss how AI systems can be used to automate complex workflows and unlock value from legacy enterprise data. Robin and Luke detail high-impact use cases from HPE and Kamiwaza’s collaboration on an “Agentic Smart City” project for Vail, Colorado, including remediation and automation of website accessibility for 508 compliance, digitization and understanding of deed restrictions, and combining contextual information with camera feeds for fire detection and risk assessment. Additionally, we discuss the role of private cloud infrastructure in overcoming challenges like cost, data privacy, and compliance. Robin and Luke also share their lessons learned, including the importance of fresh data, and the value of a "mud puddle by mud puddle" approach in achieving practical AI wins. The complete show notes for this episode can be found at https://twimlai.com/go/755.

Building an AI Mathematician with Carina Hong - #754

2025-11-0454:522

In this episode, Carina Hong, founder and CEO of Axiom, joins us to discuss her work building an "AI Mathematician." Carina explains why this is a pivotal moment for AI in mathematics, citing a convergence of three key areas: the advanced reasoning capabilities of modern LLMs, the rise of formal proof languages like Lean, and breakthroughs in code generation. We explore the core technical challenges, including the massive data gap between general-purpose code and formal math code, and the difficult problem of "autoformalization," or translating natural language proofs into a machine-verifiable format. Carina also shares Axiom's vision for a self-improving system that uses a self-play loop of conjecturing and proving to discover new mathematical knowledge. Finally, we discuss the broader applications of this technology in areas like formal verification for high-stakes software and hardware. The complete show notes for this episode can be found at https://twimlai.com/go/754.

High-Efficiency Diffusion Models for On-Device Image Generation and Editing with Hung Bui - #753

2025-10-2851:531

In this episode, Hung Bui, Technology Vice President at Qualcomm, joins us to explore the latest high-efficiency techniques for running generative AI, particularly diffusion models, on-device. We dive deep into the technical challenges of deploying these models, which are powerful but computationally expensive due to their iterative sampling process. Hung details his team's work on SwiftBrush and SwiftEdit, which enable high-quality text-to-image generation and editing in a single inference step. He explains their novel distillation framework, where a multi-step teacher model guides the training of an efficient, single-step student model. We explore the architecture and training, including the use of a secondary 'coach' network that aligns the student's denoising function with the teacher's, allowing the model to bypass the iterative process entirely. Finally, we discuss how these efficiency breakthroughs pave the way for personalized on-device agents and the challenges of running reasoning models with techniques like inference-time scaling under a fixed compute budget. The complete show notes for this episode can be found at https://twimlai.com/go/753.

Vibe Coding's Uncanny Valley with Alexandre Pesant - #752

2025-10-2201:12:061

Today, we're joined by Alexandre Pesant, AI lead at Lovable, who joins us to discuss the evolution and practice of vibe coding. Alex shares his take on how AI is enabling a shift in software development from typing characters to expressing intent, creating a new layer of abstraction similar to how high-level code compiles to machine code. We explore the current capabilities and limitations of coding agents, the importance of context engineering, and the practices that separate successful vibe coders from frustrated ones. Alex also shares Lovable’s technical journey, from an early, complex agent architecture that failed, to a simpler workflow-based system, and back again to an agentic approach as foundation models improved. He also details the company's massive scaling challenges—like accidentally taking down GitHub—and makes the case for why robust evaluations and more expressive user interfaces are the most critical components for AI-native development tools to succeed in the near future. The complete show notes for this episode can be found at https://twimlai.com/go/752.

Dataflow Computing for AI Inference with Kunle Olukotun - #751

2025-10-1456:37

In this episode, we're joined by Kunle Olukotun, professor of electrical engineering and computer science at Stanford University and co-founder and chief technologist at Sambanova Systems, to discuss reconfigurable dataflow architectures for AI inference. Kunle explains the core idea of building computers that are dynamically configured to match the dataflow graph of an AI model, moving beyond the traditional instruction-fetch paradigm of CPUs and GPUs. We explore how this architecture is well-suited for LLM inference, reducing memory bandwidth bottlenecks and improving performance. Kunle reviews how this system also enables efficient multi-model serving and agentic workflows through its large, tiered memory and fast model-switching capabilities. Finally, we discuss his research into future dynamic reconfigurable architectures, and the use of AI agents to build compilers for new hardware. The complete show notes for this episode can be found at https://twimlai.com/go/751.

Recurrence and Attention for Long-Context Transformers with Jacob Buckman - #750

2025-10-0756:531

Today, we're joined by Jacob Buckman, co-founder and CEO of Manifest AI to discuss achieving long context in transformers. We discuss the bottlenecks of scaling context length and recent techniques to overcome them, including windowed attention, grouped query attention, and latent space attention. We explore the idea of weight-state balance and the weight-state FLOP ratio as a way of reasoning about the optimality of compute architectures, and we dig into the Power Retention architecture, which blends the parallelization of attention with the linear scaling of recurrence and promises speedups of >10x during training and >100x during inference. We review Manifest AI’s recent open source projects as well: Vidrial—a custom CUDA framework for building highly optimized GPU kernels in Python, and PowerCoder—a 3B-parameter coding model fine-tuned from StarCoder to use power retention. Our chat also covers the use of metrics like in-context learning curves and negative log likelihood to measure context utility, the implications of scaling laws, and the future of long context lengths in AI applications. The complete show notes for this episode can be found at https://twimlai.com/go/750.

The Decentralized Future of Private AI with Illia Polosukhin - #749

2025-09-3001:04:03

In this episode, Illia Polosukhin, a co-author of the seminal "Attention Is All You Need" paper and co-founder of Near AI, joins us to discuss his vision for building private, decentralized, and user-owned AI. Illia shares his unique journey from developing the Transformer architecture at Google to building the NEAR Protocol blockchain to solve global payment challenges, and now applying those decentralized principles back to AI. We explore how Near AI is creating a decentralized cloud that leverages confidential computing, secure enclaves, and the blockchain to protect both user data and proprietary model weights. Illia also shares his three-part approach to fostering trust: open model training to eliminate hidden biases and "sleeper agents," verifiability of inference to ensure the model runs as intended, and formal verification at the invocation layer to enforce composable guarantees on AI agent actions. Finally, Illia shares his perspective on the future of open research, the role of tokenized incentive models, and the need for formal verification in building compliance and user trust. The complete show notes for this episode can be found at https://twimlai.com/go/749.

Inside Nano Banana 🍌 and the Future of Vision-Language Models with Oliver Wang - #748

2025-09-2301:03:09

Today, we’re joined by Oliver Wang, principal scientist at Google DeepMind and tech lead for Gemini 2.5 Flash Image—better known by its code name, “Nano Banana.” We dive into the development and capabilities of this newly released frontier vision-language model, beginning with the broader shift from specialized image generators to general-purpose multimodal agents that can use both visual and textual data for a variety of tasks. Oliver explains how Nano Banana can generate and iteratively edit images while maintaining consistency, and how its integration with Gemini’s world knowledge expands creative and practical use cases. We discuss the tension between aesthetics and accuracy, the relative maturity of image models compared to text-based LLMs, and scaling as a driver of progress. Oliver also shares surprising emergent behaviors, the challenges of evaluating vision-language models, and the risks of training on AI-generated data. Finally, we look ahead to interactive world models and VLMs that may one day “think” and “reason” in images. The complete show notes for this episode can be found at https://twimlai.com/go/748.

Is It Time to Rethink LLM Pre-Training? with Aditi Raghunathan - #747

2025-09-1658:29

Today, we're joined by Aditi Raghunathan, assistant professor at Carnegie Mellon University, to discuss the limitations of LLMs and how we can build more adaptable and creative models. We dig into her ICML 2025 Outstanding Paper Award winner, “Roll the dice & look before you leap: Going beyond the creative limits of next-token prediction,” which examines why LLMs struggle with generating truly novel ideas. We dig into the "Roll the dice" approach, which encourages structured exploration by injecting randomness at the start of generation, and the "Look before you leap" concept, which trains models to take "leaps of thought" using alternative objectives to create more diverse and structured outputs. We also discuss Aditi’s papers exploring the counterintuitive phenomenon of "catastrophic overtraining," where training models on more data improves benchmark performance but degrades their ability to be fine-tuned for new tasks, and dig into her lab's work on creating more controllable and reliable models, including the concept of "memorization sinks," an architectural approach to isolate and enable the targeted unlearning of specific information. The complete show notes for this episode can be found at https://twimlai.com/go/747.

Building an Immune System for AI Generated Software with Animesh Koratana - #746

2025-09-0901:04:411

Today, we're joined by Animesh Koratana, founder and CEO of PlayerZero to discuss his team’s approach to making agentic and AI-assisted coding tools production-ready at scale. Animesh explains how rapid advances in AI-assisted coding have created an “asymmetry” where the speed of code output outpaces the maturity of processes for maintenance and support. We explore PlayerZero’s debugging and code verification platform, which uses code simulations to build a "memory bank" of past bugs and leverages an ensemble of LLMs and agents to proactively simulate and verify changes, predicting potential failures. Animesh also unpacks the underlying technology, including a semantic graph that analyzes code bases, ticketing systems, and telemetry to trace and reason through complex systems, test hypotheses, and apply reinforcement learning techniques to create an “immune system” for software. Finally, Animesh shares his perspective on the future of the software development lifecycle (SDLC), rethinking organizational workflows, and ensuring security as AI-driven tools continue to mature. The complete show notes for this episode can be found at https://twimlai.com/go/746.

Autoformalization and Verifiable Superintelligence with Christian Szegedy - #745

2025-09-0201:11:181

In this episode, Christian Szegedy, Chief Scientist at Morph Labs, joins us to discuss how the application of formal mathematics and reasoning enables the creation of more robust and safer AI systems. A pioneer behind concepts like the Inception architecture and adversarial examples, Christian now focuses on autoformalization—the AI-driven process of translating mathematical concepts from their human-readable form into rigorously formal, machine-verifiable logic. We explore the critical distinction between the informal reasoning of current LLMs, which can be prone to errors and subversion, and the provably correct reasoning enabled by formal systems. Christian outlines how this approach provides a robust path toward AI safety and also creates the high-quality, verifiable data needed to train models capable of surpassing human scientists in specialized domains. We also delve into his predictions for achieving this superintelligence and his ultimate vision for AI as a tool that helps humanity understand itself. The complete show notes for this episode can be found at https://twimlai.com/go/745.

Multimodal AI Models on Apple Silicon with MLX with Prince Canuma - #744

2025-08-2601:09:501

Today, we're joined by Prince Canuma, an ML engineer and open-source developer focused on optimizing AI inference on Apple Silicon devices. Prince shares his journey to becoming one of the most prolific contributors to Apple’s MLX ecosystem, having published over 1,000 models and libraries that make open, multimodal AI accessible and performant on Apple devices. We explore his workflow for adapting new models in MLX, the trade-offs between the GPU and Neural Engine, and how optimization methods like pruning and quantization enhance performance. We also cover his work on "Fusion," a weight-space method for combining model behaviors without retraining, and his popular packages—MLX-Audio, MLX-Embeddings, and MLX-VLM—which streamline the use of MLX across different modalities. Finally, Prince introduces Marvis, a real-time speech-to-speech voice agent, and shares his vision for the future of AI, emphasizing the move towards "media models" that can handle multiple modalities, and more. The complete show notes for this episode can be found at https://twimlai.com/go/744.

Genie 3: A New Frontier for World Models with Jack Parker-Holder and Shlomi Fruchter - #743

2025-08-1901:00:311

Today, we're joined by Jack Parker-Holder and Shlomi Fruchter, researchers at Google DeepMind, to discuss the recent release of Genie 3, a model capable of generating “playable” virtual worlds. We dig into the evolution of the Genie project and review the current model’s scaled-up capabilities, including creating real-time, interactive, and high-resolution environments. Jack and Shlomi share their perspectives on what defines a world model, the model's architecture, and key technical challenges and breakthroughs, including Genie 3’s visual memory and ability to handle “promptable world events.” Jack, Shlomi, and Sam share their favorite Genie 3 demos, and discuss its potential as a dynamic training environment for embodied AI agents. Finally, we will explore future directions for Genie research. The complete show notes for this episode can be found at https://twimlai.com/go/743.

Closing the Loop Between AI Training and Inference with Lin Qiao - #742

2025-08-1201:00:401

In this episode, we're joined by Lin Qiao, CEO and co-founder of Fireworks AI. Drawing on key lessons from her time building PyTorch, Lin shares her perspective on the modern generative AI development lifecycle. She explains why aligning training and inference systems is essential for creating a seamless, fast-moving production pipeline, preventing the friction that often stalls deployment. We explore the strategic shift from treating models as commodities to viewing them as core product assets. Lin details how post-training methods, like reinforcement fine-tuning (RFT), allow teams to leverage their own proprietary data to continuously improve these assets. Lin also breaks down the complex challenge of what she calls "3D optimization"—balancing cost, latency, and quality—and emphasizes the role of clear evaluation criteria to guide this process, moving beyond unreliable methods like "vibe checking." Finally, we discuss the path toward the future of AI development: designing a closed-loop system for automated model improvement, a vision made more attainable by the exciting convergence of open and closed-source model capabilities. The complete show notes for this episode can be found at https://twimlai.com/go/742.

Comments (27)

lawangl ang

comparing a compiler to vibe coding is not correct at least for now. The current language compiler is based on Math which we can have a tremendous trust in the correctness while vibe coding is a totally different thing. People without coding knowledge and training can be really hard to ask correct questions, review and debug code that is generated with hallucination

Dec 24th

Zak Andrews

I cannot recommend additional data management services enough. The experience of using these services has been nothing short of transformative. With the help of these services, I have been able to streamline my data storage and organization processes, making it easier than ever to access and analyze important information, for more information visit and read https://hitechglitz.com/why-do-you-need-additional-data-management-services/ . The level of efficiency and accuracy that these services provide is truly unparalleled. Trust me, investing in additional data management services will be one of the best decisions you make for your business.

Mar 31st

Soran Ghaderi

we want lyric text

Oct 26th

Priya Dharshini

🔴WATCH>>ᗪOᗯᑎᒪOᗩᗪ>>👉https://co.fastmovies.org

Jan 16th

ali ghanbarzade

It was fantastic! Thank u very much!

Nov 21st

Hamed Gh

great

Aug 1st

Andrew Miller

As someone interested in both data science and agriculture, I found this podcast fascinating. The potential applications for AI in agriculture are vast and exciting, but as the podcast notes, high-quality data annotation is crucial to the success of these technologies. That's why I highly recommend checking out this article on https://www.waybinary.com/types-of-data-annotation-for-ai-applications/, which delves deeper into the importance of data annotation and the different techniques used in the field.

Apr 21st

10/10 podcast about an interesting topic. Today AI is everywhere and without proper data processing, it just can't function right. Additional to info here, check https://www.businessmodulehub.com/blog/advantages-of-data/. Some information overlaps with the podcast, but still, many new tips on annotation automation and quality control. Strongly recommend it to anyone interested in machine learning.

Apr 20th

Emilia Gray

Even though automation has improved over the years, it still lacks intelligence. Machine learning algorithms can organize data themselves by learning the ownership of specific data types, which makes automation more efficient, you can find good specialists in this field here https://indatalabs.com/services/machine-learning-consulting

May 24th

Flavio Coelho

what's ADP?

Dec 12th

Duncan Pullen

This was a simply amazing episode. so much depth of information about real life and life changing AI/ML

Nov 22nd

Daniel Sierra

Best podcast on machine learning an ai

May 27th

Özgür Yüksel

Thanks a lot for introducing us to the genius of our age. Tremendously inspiring.

Dec 11th

Glory Dey

A very good insightful episode, Maki Moussavi explains the various points in a lucid manner. Truly, we are the captain of our life's ship. We are responsible for our own emotions and actions. Being proactive rather than reactive is the key to success and happiness! I will be reading this book! Thanks for sharing this interesting podcast. Have a great day!

Oct 15th

I love this channel and all the great podcasts. The topics are very relevant and the speakers are well informed experts so the episodes are very educative. Only request, please change the opening music note of the podcast. It is very unpleasant tune sets a jarring effect right at the beginning. Otherwise all these episodes are very interesting in the field of innovations in Artificial Intelligence and Machine Learning! Regards!

Jun 25th

Billy Bloomer

so smart you can smell it

Jun 14th

raqueeb shaikh

great podcast

May 31st

Loza Boza

Phenomenal discussion. Thank you! Particularly enjoyed the parts on generative models and the link to Daniel Kahneman.

May 20th

simon abdou

Horrible Audio

May 9th

This is a very realistic and proper episode which explains quantum computing even as alone.

Apr 9th

#box-pro-ellipsis-177209077091420{-webkit-line-clamp:2;}The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

lawangl ang

Zak Andrews

Soran Ghaderi

Priya Dharshini

ali ghanbarzade

Hamed Gh

Andrew Miller

Andrew Miller

Emilia Gray

Flavio Coelho

Duncan Pullen

Daniel Sierra

Özgür Yüksel

Glory Dey

Glory Dey

Billy Bloomer

raqueeb shaikh

Loza Boza

simon abdou

Özgür Yüksel

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)