Listen Top Shows Blog

How AI Learned to Chat About Pictures: Inside the MoshiVis Model

How AI Learned to Chat About Pictures: Inside the MoshiVis Model

Update: 2025-04-02

Share

Description

How do you teach a sophisticated speech AI to understand and discuss images, especially when paired image-speech data is rare?

This episode unpacks MoshiVis, a new model that achieves just that. We explore the challenges of building Vision-Speech Models and how MoshiVis overcomes them with a unique one-stage training pipeline, synthetic dialogues, and efficient "perceptual augmentation" techniques built upon the Moshi speech LLM.

Join us for a deep dive into the tech that lets AI see, speak, and converse fluidly about the visual world.

Comments

In Channel

LLM Evaluation - How We Really Know If AI Is Getting Smarter

LLM Evaluation - How We Really Know If AI Is Getting Smarter

2025-05-1925:44

Teaching LLMs to Plan: Logical CoT Instruction Tuning for Symbolic Planning

Teaching LLMs to Plan: Logical CoT Instruction Tuning for Symbolic Planning

2025-10-0516:30

Five Orders of Magnitude: Analog Gain Cells Slash Energy and Latency for Ultra-Fast LLMs

Five Orders of Magnitude: Analog Gain Cells Slash Energy and Latency for Ultra-Fast LLMs

2025-10-0517:22

The Great Undertraining: How a 70B Model Called Chinchilla Exposed the AI Industry's Billion-Dollar Mistake

The Great Undertraining: How a 70B Model Called Chinchilla Exposed the AI Industry's Billion-Dollar Mistake

2025-08-0313:37

RewardAnything: Generalizable Principle-Following Reward Models

RewardAnything: Generalizable Principle-Following Reward Models

2025-08-0320:40

AI That Evolves: Inside the Darwin Gödel Machine

AI That Evolves: Inside the Darwin Gödel Machine

2025-06-3028:32

The AI Reasoning Illusion: Why 'Thinking' Models Break Down

The AI Reasoning Illusion: Why 'Thinking' Models Break Down

2025-06-1412:15

When AI Rewrites Its Own Code to Win: Agent of Change

When AI Rewrites Its Own Code to Win: Agent of Change

2025-06-1313:18

Eureka: How AI Learned to Write Better Reward Functions Than Human Experts

Eureka: How AI Learned to Write Better Reward Functions Than Human Experts

2025-06-0720:54

AlphaEvolve: How Google's AI Now Evolves Code to Solve Decades-Old Puzzles & Optimize Our World

AlphaEvolve: How Google's AI Now Evolves Code to Solve Decades-Old Puzzles & Optimize Our World

2025-06-0425:25

From QA to AI Improvement Engineer: Navigating the Shift in the AI Era

From QA to AI Improvement Engineer: Navigating the Shift in the AI Era

2025-05-0545:56

The Blueprint Behind Google—and the Future of AI Retrieval

The Blueprint Behind Google—and the Future of AI Retrieval

2025-04-2818:43

Running Down a Dream: Bill Gurley’s Roadmap to a Career You Love

Running Down a Dream: Bill Gurley’s Roadmap to a Career You Love

2025-04-2014:12

The Divine Discontent (Constructive Dissatisfaction): Inside Ogilvy's Creative Habits

The Divine Discontent (Constructive Dissatisfaction): Inside Ogilvy's Creative Habits

2025-04-2010:49

RAG-MCP: Mitigating Prompt Bloat and Enhancing Tool Selection for LLM

RAG-MCP: Mitigating Prompt Bloat and Enhancing Tool Selection for LLM

2025-05-1313:45

DeepSeek Prover V2 - AI's New Frontier in Formal Mathematics

DeepSeek Prover V2 - AI's New Frontier in Formal Mathematics

2025-05-1216:38

Defeating Prompt Injections by Design: The CaMeL Approach

Defeating Prompt Injections by Design: The CaMeL Approach

2025-05-0328:33

Don’t Just Generate, Dominate: Your Generative AI Level Up Starts Now

Don’t Just Generate, Dominate: Your Generative AI Level Up Starts Now

2025-04-1907:45

How AI Learned to Chat About Pictures: Inside the MoshiVis Model

How AI Learned to Chat About Pictures: Inside the MoshiVis Model

2025-04-0214:30

DeepSeek LLM: The Open Source AI Revolution

DeepSeek LLM: The Open Source AI Revolution

2025-01-2415:42

00:00

00:00

1.0x

How AI Learned to Chat About Pictures: Inside the MoshiVis Model

How AI Learned to Chat About Pictures: Inside the MoshiVis Model

GenAI Level UP