Two Voice Devs

Mark and Allen talk about the latest news in the VoiceFirst world from a developer point of view.

PLAY ON CASTBOX

Episode 255 - Agonizing About Agent-to-Agent

Join Allen Firstenberg and Noble Ackerson in a deep dive into the evolving world of AI agent protocols. In this episode of Two Voice Devs, they unpack the Agent-to-Agent (A2A) protocol, comparing it with the Model Context Protocol (MCP). They explore the fundamental differences, from A2A's conversational, stateful nature to MCP's function-call-like structure. The discussion also touches on the new Agent Payment Protocol (AP2) and its potential to revolutionize how AI agents interact and transact. Is A2A the key to unlocking a future of autonomous, ambient AI? Tune in to find out![00:01:00] What is the A2A protocol?[00:04:00] A2A vs. Model Context Protocol (MCP)[00:10:00] What does A2A bring that MCP doesn't?[00:15:00] Ambient and Autonomous Agents[00:19:00] A2A solves the "Tower of Babel" problem[00:24:00] The difference between A2A and MCP: stateful vs. stateless[00:27:00] Agent Payment Protocol (AP2)[00:33:00] What does A2A promise for autonomous agents?[00:38:00] Downsides and challenges of A2A[00:44:00] Google, Gemini, and the future of A2A#A2A #MCP #AI #ArtificialIntelligence #AgentToAgent #ModelContextProtocol #TwoVoiceDevs #TechPodcast #FutureOfAI #AutonomousAgents #AIAgents #AP2 #AgentPaymentProtocol #GoogleGemini #Anthropic

09-25

49:06

Episode 254 - Agent Frameworks Compared: Google's ADK vs LangChainJS

Allen and Mark are back to discuss AI agent frameworks again. This time, Allen compares Google's Agent Development Kit (ADK) with LangChainJS and LangGraphJS. He walks through building a simple agent in both frameworks, highlighting the differences in their approaches, from configuration by convention in ADK to the explicit configuration in LangGraph. They also explore the web-based testing environments for both, showing how each allows for debugging and inspecting the agent's behavior. The discussion also touches on the upcoming LangChain version 1.0 and its focus on backward compatibility.[00:00:00] - Introduction[00:01:09] - Comparing agent frameworks: Google's ADK and LangChainJS[00:02:20] - A look at the ADK code[00:06:55] - A look at the LangChainJS code[00:13:20] - The web interface for testing[00:19:10] - ADK's web interface[00:22:30] - LangGraph's web interface[00:27:20] - LangGraph's state management[00:32:15] - Final thoughts#AI #AgenticAI #GoogleADK #LangChain #LangGraph #JavaScript #Python #TwoVoiceDevs

09-18

33:21

Episode 253 - The Future of Voice? Exploring Gemini 2.5's TTS Model

In this episode of Two Voice Devs, Mark and Allen dive into the new experimental Text-to-Speech (TTS) model in Google's Gemini 2.5. They explore its capabilities, from single-speaker to multi-speaker audio generation, and discuss how it's a significant leap from the old days of SSML. They also touch on how this new technology can be integrated with LangChainJS to create more dynamic and natural-sounding voice applications. Is this the return of voice as the primary interface for AI?[00:00:00] Introduction[00:00:45] Google's new experimental TTS model for Gemini[00:01:55] Demo of single-speaker TTS in Google's AI Studio[00:03:05] Code walkthrough for single-speaker TTS[00:04:30] Lack of fine-grained control compared to SSML[00:05:15] Using text cues to shape the TTS output[00:06:20] Demo of multi-speaker TTS with a script[00:09:50] Code walkthrough for multi-speaker TTS[00:11:30] The model is tuned for TTS, not general conversation[00:12:10] Using a separate LLM to generate a script for the TTS model[00:13:30] Code walkthrough of the two-function approach with LangChainJS[00:16:15] LangChainJS integration details[00:19:00] Is Speech Markdown still relevant?[00:21:20] Latency issues with the current TTS model[00:22:00] Caching strategies for TTS[00:23:30] Voice as the natural UI for AI[00:25:30] Outro#Gemini #TTS #VoiceAI #VoiceFirst #AI #Google #LangChainJS #LLM #Developer #Podcast

08-29

25:40

Episode 252 - GPT-5 First Look: Evolution, Not Revolution

Join Allen and Mark as they take a first look at the newly released GPT-5 from OpenAI. They dive into the details of what's new, what's changed, and what's missing, frequently comparing it to other models like Google's Gemini. From the new mini and nano models to the pricing wars with competitors, they cover the landscape of the latest LLM offerings. They also discuss the new features for developers, including verbosity settings and constrained outputs with context-free grammars, and what this means for the future of AI development. Is GPT-5 the leap forward everyone was expecting, or a sign that the rapid pace of AI evolution is starting to plateau? Tune in to find out![00:00:00] Introduction and the hype around GPT-5[00:01:00] Overview of GPT-5, mini, and nano models[00:02:00] The new "thinking" model and smart routing[00:03:00] Simplifying models for developers[00:04:00] Reasoning levels vs. Gemini's "thinking budget"[00:06:00] Pricing wars and new models[00:07:00] OpenAI's new open source models[00:08:00] New verbosity setting for developers[00:09:00] Constrained outputs and context-free grammars[00:12:00] Using LLMs to translate to well-defined data structures[00:14:00] Reducing hallucinations and medical applications[00:16:00] Knowledge cutoff dates for the new models[00:18:00] Coding with GPT-5 and IDE integration[00:19:00] More natural conversations with ChatGPT[00:21:00] Missing audio and image modalities vs. Gemini[00:22:00] Community reaction to the GPT-5 release[00:24:00] The future of LLMs: Maturing and plateauing[00:26:00] The need for better developer tools and agentic computing#GPT5 #OpenAI #LLM #AI #ArtificialIntelligence #Developer #TechTalk #Podcast #AIDEvelopment #MachineLearning #FutureOfAI #AGI #GoogleGemini #TwoVoiceDevs

08-15

27:35

Episode 251 - AI Agents: Frameworks and Concepts

Join Mark and Allen in this episode of Two Voice Devs as they explore the fascinating world of AI agents. They break down what agents are, how they work, and what sets them apart from earlier AI technologies. The discussion covers key concepts like "context engineering," and the essential components of an agentic system, including prompts, RAG, memory, tools, and structured outputs.Using a practical example of a prescription management chatbot for veterans, they demonstrate how agents can handle complex tasks. They compare various frameworks for building agents, specifically focusing on OpenAI's Agent SDK (for TypeScript) and Microsoft's Semantic Kernel (for C#). They also touch on other popular frameworks like LangGraph and Google's Agent Developer Kit.Tune in for a detailed comparison of how OpenAI's Agent SDK and Microsoft's Semantic Kernel handle state, tools, and the overall agent lifecycle, and learn what the future holds for these intelligent systems.[00:00:00] - Introduction[00:01:02] - What is an AI Agent?[00:03:12] - Context Engineering and its components[00:06:02] - The role of the Agent Controller[00:08:01] - Agent Mode vs. Agent AI[00:09:36] - Use Case: Prescription Management Chatbot[00:13:42] - Handling Large Lists of Data[00:16:15] - Tools and State Management[00:21:05] - Filtering and Searching with Tools[00:27:08] - Displaying Information and Iterating through lists[00:30:10] - The power of LLMs in Agentic Systems[00:35:18] - Sub-agents and the future of agentic systems[00:38:25] - Comparing different Agent Frameworks[00:39:00] - Wrap up#AIAgents #TwoVoiceDevs #ContextEngineering #OpenAIAgentSDK #SemanticKernel #LangGraph #GoogleADK #LLMs #GenAI #AI #Developer #Podcast #TypeScript #CSharp

08-12

39:22

Episode 250 - Five Years Up, Up, and Away in Voice & AI

Join Mark and Allen for a very special 250th episode as they celebrate five years of Two Voice Devs! You won't want to miss the unique, AI-animated opening that takes them to new heights, or the special closing that brings it all home, both created with the help of Veo 3. In between, they take a look back at the evolution of voice and AI technology. From the early days of Alexa and Google Assistant to the rise of LLMs and generative AI, they discuss the shifts in the industry, the enduring importance of context, and what the future might hold for agentic AI, security, and the developer experience.[00:02:45] - Where did we think the industry would be in 5 years?[00:05:30] - How LLMs and Generative AI changed the landscape[00:11:05] - Context Engineering is the new Prompt Engineering[00:14:30] - The explosion of frameworks, libraries, and models[00:18:00] - The importance of guardrails and security[00:22:30] - Where are things going in the near term?[00:27:30] - The future of devices and developer platforms[00:30:00] - Right-sizing models and the cost of AI[00:33:30] - The importance of community and having fun#TwoVoiceDevs #VoiceAI #ArtificialIntelligence #LLMs #GenerativeAI #AIAgents #VoiceFirst #TechPodcast #ConversationalAI #AICommunity #FutureOfTech #AIEthics #AISecurity #DeveloperExperience #HotAirBalloon #Veo3

07-31

36:14

Episode 249 - Cracking Copilot and the Mysteries of Microsoft 365

In this episode, guest host Andrew Connell, a Microsoft MVP of 21 years, joins Allen to unravel the complexities of Microsoft's AI strategy, particularly within the enterprise. They explore the world of Microsoft 365 Copilot, distinguishing it from the broader AI landscape and consumer tools like ChatGPT. Andrew provides an insider's look at how Copilot functions within a secure, private "enclave," leveraging a "Semantic Index" of your organization's data to provide relevant, contextual answers.The conversation then shifts to the developer experience. Discover the different ways developers can extend and customize Copilot, from low-code solutions in Copilot Studio to creating powerful "declarative agents" with JSON and even building "custom engine agents" where you can bring your own models and infrastructure. If you've ever wondered what Microsoft's AI story is for businesses and internal developers, this episode provides a comprehensive and honest overview.Timestamps:[00:00:01] - Introducing guest host Andrew Connell[00:00:54] - What is a Microsoft 365 developer?[00:01:40] - Andrew's journey into the Microsoft ecosystem[00:05:00] - 21 years as a Microsoft MVP[00:06:15] - Enterprise Cloud vs. Developer Cloud[00:08:06] - Microsoft's AI focus for the enterprise[00:10:57] - What is Microsoft 365 Copilot?[00:13:07] - How Copilot ensures data privacy with a "secure enclave"[00:14:58] - Understanding the Semantic Index[00:16:31] - Is Copilot a Retrieval Augmented Generation (RAG) system?[00:17:23] - Responsible AI in the Copilot stack[00:19:19] - The developer story for extending Copilot[00:22:43] - Building declarative agents with JSON and YAML[00:25:05] - Using actions and tools with agents[00:27:00] - How agents are deployed via Microsoft Teams[00:32:48] - Where does Copilot actually run?[00:36:20] - Key takeaways from Microsoft Build[00:41:20] - The spectrum of development: low-code to full-code[00:43:00] - Full control with Custom Engine Agents[00:49:30] - Where to find Andrew Connell onlineHashtags:#Microsoft #AI #Copilot #Microsoft365 #Azure #SharePoint #MicrosoftTeams #MVP #Developer #Podcast #Tech #EnterpriseSoftware #CloudComputing #ArtificialIntelligence #Agents #LowCode #NoCode #RAG

07-24

52:07

Episode 248 - AI Showdown: Gemini CLI vs. Claude Code CLI

Join Allen Firstenberg and guest host Isaac Johnson, a Google Developer Expert with a deep background in DevOps and SRE, as they dive into the world of command-line AI assistants. In this episode, they compare and contrast two powerful tools: Anthropic's Claude Code CLI and Google's Gemini CLI.Isaac shares his journey from coding with Fortran in the 90s to becoming a GDE, and explains why he often prefers the focused, context-aware power of a CLI tool over crowded IDE integrations. They discuss the pros and cons of each approach, from ease of use and learning curves to the critical importance of using version control as a safety net.The conversation then gets practical with a live demo where both Claude and Gemini are tasked with generating system architecture diagrams for a real-world project. Discover the differences in speed, cost, output, and user experience. Plus, learn how to customize Gemini's behavior with `GEMINI.md` files and explore fascinating use cases beyond just writing code, including podcast production, image generation, and more.[00:00:30] - Introducing the topic: AI assistants in the command line.[00:01:00] - Guest Isaac Johnson's extensive background in tech.[00:03:00] - Why use a CLI tool instead of an IDE plugin?[00:07:30] - Pro Tip: Always use Git with AI coding tools![00:09:30] - The cost of AI: Comparing Claude's and Gemini's pricing.[00:12:15] - The benefits of Gemini CLI being open source.[00:17:30] - Live Demo: Claude Code CLI generates a system diagram.[00:21:30] - Live Demo: Gemini CLI tackles the same task.[00:27:30] - Customizing your AI with system prompts (`GEMINI.md`).[00:31:30] - Beyond Code: Using CLI tools for podcasting and media generation.[00:40:30] - Where to find and connect with Isaac Johnson.#AI #DeveloperTools #CLI #Gemini #Claude #GoogleCloud #Anthropic #TwoVoiceDevs #TechPodcast #SoftwareDevelopment #DevOps #SRE #AIassistant #Coding #Programming #FirebaseStudio #Imagen #Veo

07-17

41:31

Episode 247 - Apple's AI Gets Serious

John Gillilan, our official Apple correspondent, returns to Two Voice Devs to unpack the major announcements from Apple's latest Worldwide Developer Conference (WWDC). After failing to ship the ambitious "Apple Intelligence" features promised last year, how did Apple address the elephant in the room? We dive deep into the new "Foundation Models Framework," which gives developers unprecedented access to on-device LLMs. We explore how features like structured data output with the "Generable" macro, "Tools" for app integration, and trainable "Adapters" are changing the game for developers. We also touch on the revamped speech-to-text, "Visual Intelligence," "Swift Assist" in Xcode, and the mysterious "Private Cloud Compute." Join us as we analyze Apple's AI strategy, the internal reorgs shaping their product future, and the competitive landscape with Google and OpenAI.[00:00:00] Welcome back, John Gillilan![00:01:00] What was WWDC like from an insider's perspective?[00:06:00] Apple's big miss: What happened to last year's AI promises?[00:12:00] The new Foundation Models Framework[00:16:00] Structured data output with the "Generable" macro[00:19:00] Extending the LLM with "Tools"[00:22:00] Fine-tuning with trainable "Adapters"[00:28:00] Modernized on-device Speech-to-Text[00:29:00] "Visual Intelligence" and app integration[00:32:00] The powerful "call model" block in Shortcuts[00:36:00] Swift Assist and BYO-Model in Xcode[00:39:00] Inside Apple's big AI reorg[00:42:00] The Jony Ive / OpenAI hardware mystery[00:45:00] How Apple, Google, and OpenAI will compete and collaborate#Apple #WWDC #AI #AppleIntelligence #FoundationModels #LLM #OnDeviceAI #Swift #iOSDev #Developer #TechPodcast #TwoVoiceDevs #Siri #SwiftAssist #OpenAI #GoogleGemini #GoogleAndroid

07-10

48:35

Episode 246 - Reasoning About Gemini 2.5 "Thinking" Model

Join Allen Firstenberg and Mark Tucker as they dive into Google's latest Gemini 2.5 models and their much-touted "thinking" capabilities. In this episode, they explore whether these models are genuinely reasoning or just executing sophisticated pattern matching. Through live tests in Google's AI Studio, they pit the Pro, Flash, and Flash-Lite models against tricky riddles, analyzing the "thought process" behind the answers. The discussion also covers the practical implications for developers, the challenges of implementing these features in frameworks like LangChainJS, and the broader question of what this means for the future of AI.[00:00:00] - Introduction to Gemini 2.5 "thinking" models[00:01:00] - How "thinking" models relate to Chain of Thought prompting[00:03:00] - Advantages of separating reasoning from the answer[00:05:00] - Exploring the models (Pro, Flash, Flash-Lite) in AI Studio[00:06:00] - Thinking mode and thinking budget explained[00:09:00] - Test 1: Strawberry vs. Triangle[00:15:00] - Test 2: The "bricks vs. feathers" riddle with a twist[00:17:00] - Prompting the model to ask clarifying questions[00:25:00] - Is it reasoning or just pattern matching?[00:28:00] - Practical applications and the future of these models[00:35:00] - Implementing reasoning models in LangChainJS[00:40:00] - Conclusion#AI #GoogleGemini #ReasoningModels #ThinkingModels #LLM #ArtificialIntelligence #MachineLearning #LangChain #Developer #Podcast #TechTalk #TwoVoiceDevs

07-03

40:47

Episode 245 - From Python to TypeScript: Coding JCrew AI to Build Better Agents

Ever find that the best way to understand a new framework is to build it yourself? In this episode of Two Voice Devs, Mark Tucker takes us on a deep dive into Crew AI, a powerful Python framework for orchestrating multi-agent AI systems.To truly get under the hood, Mark decided to port the core functionality into TypeScript, creating "JCrew AI." This process provides a unique and insightful perspective on how these agent-based systems are designed. Join us as we deconstruct the core concepts of Crew AI, exploring how it simplifies the complex process of making AI agents collaborate effectively. We discuss everything from the fundamental building blocks—like agents, tasks, and crews—to the clever ways it implements prompt engineering best practices.If you're a developer interested in the architecture of modern AI applications, you'll gain a clear understanding of how to define agent roles, backstories, and goals; how to chain tasks together; and how the underlying execution loop (and its similarity to the ReAct pattern) works to produce cohesive results.Timestamps:[00:00:00] - Introduction[00:01:00] - What is Crew AI and the "JCrew AI" Learning Project[00:04:00] - Core Concepts: How Crews, Agents, and Tasks Work[00:06:00] - Anatomy of a Crew AI Agent (Role, Goal, Backstory)[00:10:00] - Building Prompts with Templates and "Slices"[00:15:00] - The Execution Flow: From "Kickoff" to Final Output[00:21:00] - Under the Hood: The Agent Executor and Core Logic Loop[00:23:00] - How Crew AI Compares to LangChain and LangGraph[00:28:00] - Practical Considerations: Human-in-the-Loop and Performance[00:30:00] - Learning a Framework by Rebuilding It#AI #ArtificialIntelligence #Developer #SoftwareEngineering #CrewAI #MultiAgentSystems #AIAgents #Python #TypeScript #PromptEngineering #LLM #Podcast

06-26

33:18

Episode 244 - What's New With Anthropic?

What do Anthropic's latest announcements mean for developers? In this episode, Allen is joined by freelance conversation designer Valentina Adami to break down all the major news from the recent "Code with Claude" event.Valentina shares her hands-on experience and perspective on the new Opus 4 and Sonnet 4 models, discussing their distinct capabilities, the new "reasoning" features, and why Anthropic's transparency with its public system prompt is a game-changer. They also explore Claude Code, the new coding assistant that runs in your terminal, and how it can be used for everything from fixing bugs to learning new frameworks.Finally, they cover the latest integrations for the Model Context Protocol (MCP) and the long-awaited addition of web searching to Claude, examining how these tools are evolving and what it means for the future of AI-assisted development.Timestamps:[00:41] Guest Valentina Adami's background in humanities and tech[06:17] What's new in the Opus 4 and Sonnet 4 models?[14:40] Are the models "thinking" or "reasoning"?[19:27] The latest on MCP (Model Context Protocol) integrations[25:03] Exploring the new coding assistant: Claude Code[31:37] Claude can now search the web#Anthropic #ClaudeAI #Opus4 #Sonnet4 #ThinkingAI #ReasoningAI #LLM #DeveloperTools #GenerativeAI #AI #Claude #CodingAssistant #MCP #ModelContextProtocol #TwoVoiceDevs

06-20

34:28

Episode 243 - AI Agents: Exploits, Ethics, and the Perils of Over-Permissive Tools

Join Allen Firstenberg and Michal Stanislawek in this thought-provoking episode of Two Voice Devs as they unpack two recent LinkedIn posts by Michal that reveal critical insights into the security and ethical challenges of modern AI agents.The discussion kicks off with a deep dive into a concerning GitHub MCP server exploit, where researchers uncovered a method to access private repositories through public channels like PRs and issues. This highlights the dangers of broadly permissive AI agents and the need for robust guardrails and input sanitization, especially when vanilla language models are given wide-ranging access to sensitive data. What happens when your 'personal assistant' acts on a malicious instruction, mistaking it for a routine task?The conversation then shifts to the ethical landscape of AI, exploring Anthropic's Claude 4 experiments which suggest that AI assistants, under certain conditions, might prioritize self-preservation or even 'snitch.' This raises profound questions for developers and users alike: How ethical do we want our agents to be? Who do they truly work for – us or the corporation? Could governments compel AI to reveal sensitive information?Allen and Michal delve into the implications for developers, stressing the importance of building specialized agents with clear workflows, implementing principles of least privilege, and rethinking current authorization protocols like OAuth to support fine-grained permissions. They argue that we must consider the AI itself as the 'user' of our tools, necessitating a fundamental shift in how we design and secure these increasingly autonomous systems.This episode is a must-listen for any developer building with AI, offering crucial perspectives on how to navigate the complex intersection of AI capabilities, security vulnerabilities, and ethical responsibilities.More Info:* https://www.linkedin.com/posts/xmstan_the-researchers-who-unveiled-claude-4s-snitching-activity-7333733889942691840-wAQ4* https://www.linkedin.com/posts/xmstan_your-ai-assistant-may-accidentally-become-activity-7333219169888305152-2cjN00:00 - Introduction: Unpacking AI Agent Security & Ethics00:50 - The GitHub MCP Server Exploit: Public Access to Private Repos02:15 - Ethical AI: Self-Preservation & The 'Snitching' Agent Dilemma04:00 - Developer Responsibility: Building Ethical & Trustworthy AI Systems09:20 - The Dangers of Vanilla LLM Integrations Without Guardrails13:00 - Custom Workflows vs. Generic Autonomous Agents17:20 - Isolation of Concerns & Principles of Least Privilege26:00 - Rethinking OAuth: The Need for Fine-Grained AI Permissions29:00 - The Holistic Approach to AI Security & Authorization#AIAgents #AIethics #AIsecurity #PromptInjection #GitHub #ModelContextProtocol #MCP #MCPservers #MCPsecurity #OAuth #Authorization #Authentication #LeastPrivilege #Privacy #Security #Exploit #Hack #RedTeam #CovertChannel #Developer #TechPodcast #TwoVoiceDevs #Anthropic #ClaudeAI #LLM #LargeLanguageModel #GenerativeAI

06-12

30:57

Episode 242 - From the Creatives Corner at I/O 2025

Join Allen Firstenberg and Linda Lawton of Two Voice Devs as they record live from Google I/O 2025! As the conference neared the end, they dive deep into the groundbreaking announcements in generative AI, discussing the latest advancements and what they mean for developers, especially those in Conversational AI.This episode explores the new and updated models that are set to redefine content creation:Lyria: Google's innovative streaming audio generation API, its unique WebSocket-based approach, and the fascinating possibilities (and challenges!) of dynamic music creation, including its potential for YouTube content and the ever-present copyright questions surrounding AI-generated media.Veo 3: The video generation powerhouse, now enhanced with synchronized audio and voice, realistic lip-sync for characters (yes, even cartoon animals!), and improvements in "world physics." They also tackle the implications of its pricing for professional and individual creators.Imagen 4: Discover the highly anticipated improvements in text generation within images, including stylized fonts and potential for other languages.Allen and Linda also share some early creations with these new models.Whether you're building the next great voice app, creating dynamic content, or just curious about the cutting edge of AI, this episode offers a developer-focused perspective on the future of generative media.00:00:00: Introduction to Two Voice Devs at I/O 202500:00:50: I/O 2025: New Generative AI Models Overview00:01:20: Lyria: Streaming Audio Generation and Documentation Challenges00:03:00: Lyria's Practical Use Cases & Generative AI Copyright Questions00:10:00: Veo 3: Video Generation with Synchronized Audio and Voice Features00:12:10: Veo 3 Pricing and Cost Implications for Developers00:14:20: Imagen 4: Improved Text Generation in Images00:17:40: Professional Use Cases for Veo and Imagen00:19:10: Flow: The New Professional Studio System for Creators00:22:00: Gemini Ultra Tiered Pricing and Regional Restrictions00:24:20: Concluding Thoughts and Call to Action#GoogleIO2025 #GenerativeAI #AIModels #Lyria #Veo3 #Imagen4 #FlowAI #TwoVoiceDevs #VoiceTech #ConversationalAI #AIDevelopment #MachineLearning #ContentCreation #YouTubeCreators #GoogleAI #VertexAI #GeminiUltra #CopyrightAI #TechPodcast

06-06

25:09

Episode 241 - Google I/O 2025: AI Highlights, Human Augmentation, and The AGI Debate

Recorded live from the podcast space at I/O, Allen Firstenberg and Roya dive into the overwhelming, yet incredibly exciting, world of AI announcements permeating the conference this year.They discuss the pervasive theme of AI augmenting human intelligence rather than replacing it, exploring concrete examples across various domains. From breakthroughs in mathematics with AlphaProof to the efficiency gains of the new Gemma 3 model (running on small devices with a tiny memory footprint and reduced environmental impact), they cover the cutting edge of AI research and application.Discover how models like CoScientists and Notebook LM are revolutionizing research and productivity (including generating podcasts from your notes!), the advancements in Gemini's audio output for more natural and multilingual conversations, and the potential for intelligence explosion with Alpha Evolve. Allen and Roya also unpack the fascinating Gemini Diffusion model's application to text and code generation and the critical role of AI in healthcare with the Amy model.The conversation wouldn't be complete without tackling the big question: the AGI (Artificial General Intelligence) debate. Is it coming soon, or is it still a distant concept? Join Allen and Roya for their perspectives straight from the heart of Google I/O.Tune in to get a developer's perspective on the future of AI driven by the latest announcements from Google I/O!00:00 - Intro & AI Everywhere at I/O01:36 - The Core Theme: AI Augments Humans01:55 - AI in Math: AlphaProof04:30 - Gemma 3: Small, Efficient, Open Models07:07 - AI for Researchers: CoScientists & Notebook LM10:05 - Enhanced Audio: Gemini Voice & Translation12:09 - Alpha Evolve: Feedback Loops & Intelligence Explosion14:15 - Gemini Diffusion: Diffusion for Text & Code21:11 - AI in Healthcare: The Amy Model22:08 - The AGI Debate: Is it Coming?#GoogleIO #IO2025 #AI #MachineLearning #DeepLearning #GeminiAI #GemmaAI #DiffusionModels #NotebookLM #HealthcareAI #AGI #ArtificialGeneralIntelligence #TwoVoiceDevs #TechPodcast #Developers #ConversationalAI

06-03

24:33

Episode 240: I/O Eyewear - From Google Glass to Gemini

The buzz from Google IO 2025 is deafening, especially about the new smart glasses announcement! On this episode of Two Voice Devs, Allen Firstenberg and Noble Ackerson — former Google Glass Explorers themselves — dive deep into their first impressions of Google's Project Astra / Android XR / Gemini glasses prototype.Drawing on their unique experience from the early days of Glass, Allen and Noble discuss the evolution of wearable computing, the collision of conversational AI (Gemini) and spatial computing (Android XR), and what this new device means for the future.They share their thoughts on the hardware design, the user interface (is it Gemini, Android XR, or both?), and critically examine the product strategy compared to Glass and other devices like the Apple Vision Pro. Most importantly for developers, they ponder the crucial question: what is the developer story here? Is Google providing the necessary tools and documentation, or are we repeating past mistakes?Tune in for a candid, experienced perspective on Google's latest foray into smart glasses and whether this iteration truly builds on the lessons learned from the past.0:00:30 - Introduction: Google IO buzz and the glasses question0:01:16 - Remembering Google Glass: First impressions & the "art of the possible"0:02:35 - From Glass to Assistant: The evolution of ubiquitous computing0:03:42 - The Collision: Conversational AI meets Spatial Computing0:03:58 - First Impressions: Trying on the new Google glasses prototype at IO0:04:25 - How Glass Shaped Us: Focusing on human factors and product strategy0:05:44 - The "If You Build It They Will Come" Trap: Why problem-solving is key0:07:48 - Contrasting with Apple Vision Pro & the "Start with VR" concern0:09:14 - Breaking Down the Stack: Hardware, Android XR, and Gemini0:14:24 - Hardware Deep Dive: Weight, balance, optics, and the lower display decision0:18:38 - UI/Interaction Discussion: Gemini's role, gestures, voice/tap inputs0:19:37 - The Developer Story: Lack of clarity and need for APIs/documentation0:27:55 - Rapid Fire: Best thing & Biggest Irk point about the prototype0:32:16 - The Big Question: Would we buy one today?0:33:08 - Final Thoughts: Value proposition and learning from Glass#AndroidXR #Gemini #GoogleGlass #GoogleIO #IO2025 #ProjectAstra #SmartGlasses #WearableTech #SpatialComputing #ConversationalAI #VoiceFirst #VoiceDevs #GlassExplorers #TechPodcast #DeveloperLife #HumanComputerInteraction #ProductStrategy #Google #GoogleDeepMind #DeepMind

05-29

34:05

Episode 239 - MCP: Hype, Security, and Real-World Use

Join us on Two Voice Devs as Allen Firstenberg talks with Rizel Scarlett, Tech Lead for Open Source Developer Relations at Block. Rizel shares her fascinating journey from psychology student to software engineer and now a leader in developer advocacy, highlighting her passion for teaching and creative problem-solving.The conversation dives deep into Block's innovative open source work, particularly their AI agent called Goose, which leverages the Model Context Protocol (MCP). Rizel explains what MCP is, seeing it as an SDK or API for AI agents, and discusses the excitement around its potential to democratize coding and other tools for developers and non-developers alike, sharing compelling use cases like automating tasks in Google Docs and interacting with Blender.However, the discussion doesn't shy away from the critical challenges facing MCP, especially concerning security. Rizel addresses concerns about trusting community-built MCP servers, potential vulnerabilities, and mitigation strategies like allow lists and building internal, vetted servers. They also explore the complexities of exposing large APIs, the demand for local AI for privacy, the current limitations of local models, and the user experience of installing and trusting MCP plugins.Rizel shares examples of promising MCP servers, including those focused on "long-term memory" and, notably, a speech/voice-controlled coding server, bringing the conversation back to the show's roots in voice development and accessibility, touching upon the concept of temporary disability.The episode concludes by reflecting on whether MCP is currently a "small, beginner solution" being hyped as a "massive, full-featured" one, the need for more honest conversations about its limitations, and the ongoing efforts within the community and companies like Block to improve the protocol, including discussions around official registries and easier installation methods like deep links.Tune in for a candid look at the exciting, yet challenging, landscape of AI agents, MCP, and open source development.More Info:* Goose - https://github.com/block/goose* Pieces for Developers - https://pieces.app/features/mcp* Speech MCP - https://glama.ai/mcp/servers/@Kvadratni/speech-mcp[00:00:48] Meet Rizel Scarlett & Her Career Journey (Psychology to Dev Advocacy)[00:03:54] Introducing Block & Its Mission (Square, Cash App, etc.)[00:04:58] Block's Open Source Division and the Goose AI Agent[00:05:48] Diving into the Model Context Protocol (MCP)[00:07:56] What is MCP? (SDK for Agents) & Exciting Use Cases (Democratization, non-developers)[00:10:36] Major Security Concerns with MCP (Trust, vulnerabilities, typo squatting)[00:11:48] Mitigation Strategies & Authentication (Allow Lists, Internal Servers, Vetting)[00:17:59] The Current State of MCP: An Infancy Protocol[00:20:09] Complexity & Context Window Challenges with MCP Servers[00:23:14] User Demand for Local AI & Data Privacy[00:25:31] User Experience of MCP Plugin Installation & Trust[00:28:42] Examples of Useful MCP Servers (Pieces, Computer Controller, Speech)[00:31:18] The Power of Voice-Controlled Coding (Accessibility, temporary disability)[00:33:59] MCP: Hype vs. Reality & The Need for Honest Conversations[00:36:00] Efforts to Improve MCP (Committees, Registries, Deep Links)#developer #programming #tech #opensource #block #ai #aigent #llm #mcp #modelcontextprotocol #devrel #developeradvocacy #security #cybersecurity #privacy #localai #remoteai #accessibility #voicecoding #riselscarlett #gooseai

05-16

41:28

Episode 238 - LLM Benchmarking: What, Why, Who, and How

How do you know if a Large Language Model is good for your specific task? You benchmark it! In this episode, Allen speaks with Amy Russ about her fascinating career path from international affairs to data, and how that unique perspective now informs her work in LLM benchmarking.Amy explains what benchmarking is, why it's crucial for both model builders and app developers, and how it goes far beyond simple technical tests to include societal, cultural, and ethical considerations like preventing harms.Learn about the complex process involving diverse teams, defining fuzzy criteria, and the technical tools used, including data versioning and prompt template engines. Amy also shares insights on how to get involved in open benchmarking efforts and where to find benchmarks relevant to your own LLM projects.Whether you're building models or using them in your applications, understanding benchmarking is key to finding and evaluating the best AI for your needs.Learn More:* ML Commons - https://mlcommons.org/Timestamps:00:18 Amy's Career Path (From Diplomacy to Data)02:46 What Amy Does Now (Benchmarking & Policy)03:38 Defining LLM Benchmarking05:08 Policy & Societal Benchmarking (Preventing Harms)07:55 The Need for Diverse Benchmarking Teams09:55 Technical Aspects & Tooling (Data Integrity, Versioning)10:50 Prompt Engineering & Versioning for Benchmarking12:48 Preventing Models from Tuning to Benchmarks15:30 Prompt Template Engines & Generating Prompts17:10 Other Benchmarking Tools & Testing Nuances19:10 Benchmarking Compared to Traditional QA21:45 Evaluating Benchmark Results (Human & Metrics)23:05 The Challenge of Establishing an Evaluation Scale23:58 How to Get Started in Benchmarking (Volunteering, Organizations)25:20 Open Benchmarks & Where to Find Them26:35 Benchmarking Your Own Model or App28:55 Why Benchmarking Matters for App Builders29:55 Where to Learn More & Follow AmyHashtags:#LLM #Benchmarking #AI #MachineLearning #GenAI #DataScience #DataEngineering #PromptEngineering #ModelEvaluation #TechPodcast #Developer #TwoVoiceDevs #MLCommons #QA

05-09

31:44

Episode 237 - Building Bridges with Developers

Join Allen Firstenberg from Google Cloud Next 2025 as he sits down with Ankur Kotwal, Google's Global Head of Cloud Advocacy. In this episode of Two Voice Devs, Allen and Ankur dive deep into the world of Developer Relations (DevRel) at Google, discussing its crucial role as a bridge connecting Google's product teams and engineers with the global developer community.Ankur shares his fascinating personal journey, from coding BASIC as a child alongside his developer dad to leading a key part of Google Cloud's developer outreach. They explore the ever-evolving landscape of technology, using the metaphor of "waves" – from early desktop computing and the internet to mobile apps and the current tidal wave of AI and "vibe coding."This conversation offers valuable insights for all developers navigating the pace of technological change. Discover what Developer Relations is and how it serves as that essential bridge, functioning bidirectionally (both outbound communication and inbound feedback). Learn about the importance of community programs like Google Developer Experts (GDEs), and how developers can effectively connect with DevRel teams to share their experiences and help shape the future of products. Ankur and Allen also reflect on the need for continuous learning, understanding underlying tech layers, and the shared passion that drives innovation in our industry.Whether you're a long-time developer or just starting out, learn how to ride the waves, connect with peers, and make your voice heard in the developer ecosystem by engaging with the DevRel bridge.More Info:* Google Developers Program: https://goo.gle/google-for-developersTimestamps:00:49 - Ankur's Role as Global Head of Cloud Advocacy01:48 - The Bi-directional Nature of Developer Relations02:34 - Ankur's Journey into Tech and DevRel09:47 - What is Developer Relations? (The DevRel Bridge Explained)12:06 - The Value of Community and Google Developer Experts (GDEs)14:08 - Allen's Motivation for Being a GDE18:24 - Riding the Waves of Technological Change (AI, Vibe Coding)20:37 - The Importance of Understanding Abstraction Layers25:41 - How Developers Can Engage with the DevRel Bridge30:50 - Providing Feedback: Does it Make a Difference?Hashtags:#DeveloperRelations #DevRel #GoogleCloud #CloudAdvocacy #DeveloperCommunity #TechEvolution #AI #ArtificialIntelligence #VibeCoding #GoogleGemini #SoftwareDevelopment #Programming #Google #GoogleCloudNext #GoogleDevRel #GDG #GDE #TwoVoiceDevs #Podcast #Developers

05-06

32:31

Episode 236 - AI, Agents, and Sphere Magic Live from Cloud Next 2025

Join Allen Firstenberg and Alice Keeler, the Two Voice Devs, live from Day 1 of Google Cloud Next 2025 in Las Vegas! In this episode, recorded amidst the energy of the show floor, Allen and Alice dive into the major announcements and highlights impacting developers, especially those interested in AI and conversational interfaces.Alice, known as the "Queen of Spreadsheets" and a Google Developer Expert for Workspace and App Sheet, shares her unique perspective on using accessible tools like App Script for real-world solutions, contrasting it with the high-end tech on display.They unpack the new suite of generative AI models announced, including Veo for video, Chirp 3 for audio, Lyric for sound generation, and updates to Imagen, all available on Vertex AI. They recount the breathtaking private premiere at Sphere, discussing how Google DeepMind's cutting-edge AI enhanced the classic Wizard of Oz film, expanding and interpolating scenes that never existed – and connect this advanced technology back to tools developers can use today.A major focus is the new Agent Builder, a tool poised to revolutionize how developers create multimodal AI agents capable of natural voice, text, and image interactions, demonstrated through exciting examples. They discuss the accessibility of this tool for developers of all levels and its potential to automate tedious tasks and create entirely new user experiences.Plus, they touch on the new Agent to Agent Protocol for complex AI workflows, updates to AI Studio, and the production readiness of the Gemini 2.0 Live API.Get a developer's take on the biggest news from Google Cloud Next 2025 Day 1 and a look ahead to the developer keynote.More Info:* Google Developers Program: https://goo.gle/google-for-developers* Next 2025 Announcements: https://cloud.google.com/blog/topics/google-cloud-next/google-cloud-next-2025-wrap-up00:00:31 Welcome to Google Cloud Next 202500:01:18 Meet Alice Keeler: Math Teacher, GDE, and App Script Developer00:03:44 App Script: Accessible Development & Real-World Solutions00:05:40 Cloud Next 2025 Day 1 Keynote Highlights00:06:18 New Generative AI Models: Veo (Video), Chirp 3 (Audio), Lyric (Sound), Imagen Updates00:09:00 The Sphere Experience & DeepMind's Wizard of Oz AI Enhancement00:14:00 From Hollywood Magic to Public Tools: Vertex AI Capabilities00:16:30 Agent Builder: The Future of AI Agents & Accessible Development00:23:37 Agent to Agent Protocol: Enabling Complex AI Workflows00:25:20 Other Developer News: AI Studio Revamp & Gemini 2.0 Live API00:26:30 Connecting with Experts & Discovering What's Next#GoogleCloudNext #GCNext #LasVegasSphere #SpehereLasVegas #TwoVoiceDevs #AI #GenerativeAI #VertexAI #Gemini #AgentBuilder #AppScript #Developers #LowCode #NoCode #AIInEducation #AIDevelopment #ConversationalAI #VoiceAI #MachineLearning #WizardOfOz

05-01

27:15

View All on Castbox

Recommend Channels