Episode 253 - The Future of Voice? Exploring Gemini 2.5's TTS Model

Update: 2025-08-29

Description

In this episode of Two Voice Devs, Mark and Allen dive into the new experimental Text-to-Speech (TTS) model in Google's Gemini 2.5. They explore its capabilities, from single-speaker to multi-speaker audio generation, and discuss how it's a significant leap from the old days of SSML. They also touch on how this new technology can be integrated with LangChainJS to create more dynamic and natural-sounding voice applications. Is this the return of voice as the primary interface for AI?

[00:00:00 ] Introduction

[00:00:45 ] Google's new experimental TTS model for Gemini

[00:01:55 ] Demo of single-speaker TTS in Google's AI Studio

[00:03:05 ] Code walkthrough for single-speaker TTS

[00:04:30 ] Lack of fine-grained control compared to SSML

[00:05:15 ] Using text cues to shape the TTS output

[00:06:20 ] Demo of multi-speaker TTS with a script

[00:09:50 ] Code walkthrough for multi-speaker TTS

[00:11:30 ] The model is tuned for TTS, not general conversation

[00:12:10 ] Using a separate LLM to generate a script for the TTS model

[00:13:30 ] Code walkthrough of the two-function approach with LangChainJS

[00:16:15 ] LangChainJS integration details

[00:19:00 ] Is Speech Markdown still relevant?

[00:21:20 ] Latency issues with the current TTS model

[00:22:00 ] Caching strategies for TTS

[00:23:30 ] Voice as the natural UI for AI

[00:25:30 ] Outro

#Gemini #TTS #VoiceAI #VoiceFirst #AI #Google #LangChainJS #LLM #Developer #Podcast

Comments

In Channel

Episode 255 - Agonizing About Agent-to-Agent

2025-09-2549:06

Episode 254 - Agent Frameworks Compared: Google's ADK vs LangChainJS

2025-09-1833:21

Episode 253 - The Future of Voice? Exploring Gemini 2.5's TTS Model

2025-08-2925:40

Episode 252 - GPT-5 First Look: Evolution, Not Revolution

2025-08-1527:35

Episode 251 - AI Agents: Frameworks and Concepts

2025-08-1239:22

Episode 250 - Five Years Up, Up, and Away in Voice & AI

2025-07-3136:14

Episode 249 - Cracking Copilot and the Mysteries of Microsoft 365

2025-07-2452:07

Episode 248 - AI Showdown: Gemini CLI vs. Claude Code CLI

2025-07-1741:31

Episode 247 - Apple's AI Gets Serious

2025-07-1048:35

Episode 246 - Reasoning About Gemini 2.5 "Thinking" Model

2025-07-0340:47

Episode 245 - From Python to TypeScript: Coding JCrew AI to Build Better Agents

2025-06-2633:18

Episode 244 - What's New With Anthropic?

2025-06-2034:28

Episode 243 - AI Agents: Exploits, Ethics, and the Perils of Over-Permissive Tools

2025-06-1230:57

Episode 242 - From the Creatives Corner at I/O 2025

2025-06-0625:09

Episode 241 - Google I/O 2025: AI Highlights, Human Augmentation, and The AGI Debate

2025-06-0324:33

Episode 240: I/O Eyewear - From Google Glass to Gemini

2025-05-2934:05

Episode 239 - MCP: Hype, Security, and Real-World Use

2025-05-1641:28

Episode 238 - LLM Benchmarking: What, Why, Who, and How

2025-05-0931:44

Episode 237 - Building Bridges with Developers

2025-05-0632:31

Episode 236 - AI, Agents, and Sphere Magic Live from Cloud Next 2025

2025-05-0127:15

00:00

Episode 253 - The Future of Voice? Exploring Gemini 2.5's TTS Model

#box-pro-ellipsis-176160250779944{-webkit-line-clamp:2;}Episode 253 - The Future of Voice? Exploring Gemini 2.5's TTS Model

Episode 253 - The Future of Voice? Exploring Gemini 2.5's TTS Model

Mark and Allen

Episode 253 - The Future of Voice? Exploring Gemini 2.5's TTS Model