Listen Top Shows Blog

Building real-time voice applications with Live API

Building real-time voice applications with Live API

Update: 2025-08-06

Share

Description

Shrestha Basu Mallick, one of the product leads for the Gemini API, joins host Logan Kilpatrick for a deep dive of Gemini Live API, Google’s real-time, multimodal interface for developers. Learn about how native audio alongside new capabilities like proactive audio and async function calling unlocks the unique power of audio as an interface.

Watch on YouTube: https://www.youtube.com/watch?v=4xlwlU6h-wM

0:00 - Intro
1:18 - Live API Overview
3:36 - Why audio is a special modality
5:07 - Speed vs. precision in audio
6:17 - Controllable and promptable TTS
8:31 - What developers are building with the Live API
11:14 - URL context and async calling features
15:02 - Proactive audio and affective dialog
16:55 - Addressing developer feedback
21:54 - Live API roadmap
23:49 - The role of long context
24:57 - What’s next for the Live API
26:41 - State of the AI audio market
30:10 - Advice for developers getting started with the Live API
31:16 - Live API demo
38:10 - Demo wrap up and closing

Comments

In Channel

Sundar Pichai: Gemini 3, Vibe Coding and Google's Full Stack Strategy

2025-11-2627:34

Nano Banana Pro: Hands-on with the World’s Most Powerful Image Model

2025-11-2636:24

Koray Kavukcuoglu: “This Is How We Are Going to Build AGI”

Koray Kavukcuoglu: “This Is How We Are Going to Build AGI”

2025-11-2548:44

Google Antigravity: Hands on with our new agentic development platform

Google Antigravity: Hands on with our new agentic development platform

2025-11-2544:49

Gemini 3: Launch day reactions

Gemini 3: Launch day reactions

2025-11-2542:16

How a Moonshot Led to Google DeepMind's Veo 3

How a Moonshot Led to Google DeepMind's Veo 3

2025-10-1648:10

GDM’s Pushmeet Kohli on solving science's biggest challenges with AI

GDM’s Pushmeet Kohli on solving science's biggest challenges with AI

2025-09-1537:28

Behind the scenes of Google's state-of-the-art "nano-banana" image model

Behind the scenes of Google's state-of-the-art "nano-banana" image model

2025-08-2730:32

Demis Hassabis on shipping momentum, better evals and world models

Demis Hassabis on shipping momentum, better evals and world models

2025-08-1131:09

Building real-time voice applications with Live API

Building real-time voice applications with Live API

2025-08-0640:14

Building a frontier AI search experience

Building a frontier AI search experience

2025-07-2343:16

Gemini's Multimodality

Gemini's Multimodality

2025-07-0244:17

Building Gemini's Coding Capabilities

Building Gemini's Coding Capabilities

2025-06-1601:00:27

Sergey Brin on the Future of AI & Gemini

Sergey Brin on the Future of AI & Gemini

2025-06-1627:19

Google I/O 2025 Recap with Josh Woodward and Tulsee Doshi

Google I/O 2025 Recap with Josh Woodward and Tulsee Doshi

2025-05-2240:15

Deep Dive into Long Context

Deep Dive into Long Context

2025-05-0259:32

Launching Gemini 2.5

Launching Gemini 2.5

2025-03-2827:55

Gemini app: Canvas, Deep Research and Personalization

Gemini app: Canvas, Deep Research and Personalization

2025-03-2036:53

Developing Google DeepMind's Thinking Models

Developing Google DeepMind's Thinking Models

2025-02-2401:03:32

Behind the Scenes of Gemini 2.0

Behind the Scenes of Gemini 2.0

2024-12-1135:18

00:00

00:00

x

Building real-time voice applications with Live API

Building real-time voice applications with Live API

Google