DiscoverBarrchivesHow Cartesia Edges Out The Big Labs With Audio AI Models, with Founder and CEO, Karan Goel
How Cartesia Edges Out The Big Labs With Audio AI Models, with Founder and CEO, Karan Goel

How Cartesia Edges Out The Big Labs With Audio AI Models, with Founder and CEO, Karan Goel

Update: 2025-03-26
Share

Description

What if AI could talk back instantly—and naturally?In this episode, Karan Goel, Co-founder & CEO of Cartesia, joins Barr Yaron to unpack the future of voice AI, state space models (SSMs), and why audio is the next frontier in AI.


Karan shares the founding story behind Cartesia, explains how alternate architectures like Mamba enable ultra-efficient, low-latency inference, and walks through how his team is building the fastest text-to-speech model in the world—while obsessing over every millisecond.


Whether you’re into model architectures, AI infrastructure, or the future of voice interfaces, this episode delivers technical depth, startup lessons, and a roadmap for what’s coming next.This episode is broken down into the following chapters:


00:00 – Intro


01:06 – Karan’s journey from CMU PhD to startup founder


03:56 – Why Cartesia is built around state space models


06:49 – What makes SSMs different from transformers


09:14 – Why compression matters for long-running AI systems


11:13 – What data types SSMs are best (and worst) for


13:39 – Scaling SSMs: What’s possible and what’s missing


15:31 – Hardware, GPUs & why SSMs work well on existing infra


18:46 – Landing on audio: Cartesia’s first core modality


22:38 – Navigating the model vs. market debate in AI startups


26:36 – How Cartesia built Sonic, their ultra-low latency TTS model


28:17 – Why latency is the #1 challenge in voice AI


30:46 – Tricks vs. model-first thinking: Baking it into the model


34:01 – How Cartesia balances fast execution with deep research


36:26 – Building with part-time academic co-founders


38:13 – Yes, every employee gets a personal Yoshi


40:02 – Where voice AI is being adopted first (telephony + beyond)


42:24 – Multilingual modeling & the long tail of language


45:02 – Voice as a new computing interface


46:26 – Why voice notes are the future (and Barr’s hot take)


49:56 – How Cartesia evaluates its models


52:44 – How Karan has grown as a founder and leader


Subscribe to the Barrchives newsletter: https://www.barrchives.com/


Spotify: https://open.spotify.com/show/37O8Pb0LgqpqTXo2GZiPXf


Apple: https://podcasts.apple.com/us/podcast/barrchives/id1774292613


Twitter: https://x.com/barrnanas


LinkedIn: https://www.linkedin.com/in/barryaron/

Comments 
In Channel
loading
00:00
00:00
x

0.5x

0.8x

1.0x

1.25x

1.5x

2.0x

3.0x

Sleep Timer

Off

End of Episode

5 Minutes

10 Minutes

15 Minutes

30 Minutes

45 Minutes

60 Minutes

120 Minutes

How Cartesia Edges Out The Big Labs With Audio AI Models, with Founder and CEO, Karan Goel

How Cartesia Edges Out The Big Labs With Audio AI Models, with Founder and CEO, Karan Goel

Barr Yaron