Discover
Google AI: Release Notes

Google AI: Release Notes
Author: Google AI
Subscribed: 38Played: 157Subscribe
Share
© 2024 Google
Description
Ever wondered what it's really like to build the future of AI? Join host Logan Kilpatrick for a deep dive into the world of Google AI, straight from the minds of the builders. We're pulling back the curtain on the latest breakthroughs, sharing the unfiltered stories behind the tech, and answering the questions you've been dying to ask.
Whether you're a seasoned developer or an AI enthusiast, this podcast is your backstage pass to the cutting-edge of AI technology. Tune in for:
- Exclusive interviews with AI pioneers and industry leaders.
- In-depth discussions on the latest AI trends and developments.
- Behind-the-scenes stories and anecdotes from the world of AI.
- Unfiltered insights and opinions from the people shaping the future.
So, if you're ready to go beyond the headlines and get the real scoop on AI, join Logan Kilpatrick on Google AI: Release Notes.
Whether you're a seasoned developer or an AI enthusiast, this podcast is your backstage pass to the cutting-edge of AI technology. Tune in for:
- Exclusive interviews with AI pioneers and industry leaders.
- In-depth discussions on the latest AI trends and developments.
- Behind-the-scenes stories and anecdotes from the world of AI.
- Unfiltered insights and opinions from the people shaping the future.
So, if you're ready to go beyond the headlines and get the real scoop on AI, join Logan Kilpatrick on Google AI: Release Notes.
15 Episodes
Reverse
Pushmeet Kohli, Head of Science and Strategic Initiatives at Google DeepMind, joins host Logan Kilpatrick to explore the intersection of AI and scientific discovery. Learn how the team's unique problem-solving framework led to innovations like AlphaFold and AlphaEvolve, and how new tools like AI Co-scientist aim to democratize these types of breakthroughs for everyone. Watch on YouTube: https://www.youtube.com/watch?v=o7mdsL6BHskChapters: 0:00 - Intro1:04 - Recent Alpha launches02:15 - Framework for selecting research domains06:21 - Scientific, commercial and social impact15:00 - Wielding AGI for breakthroughs16:48 - Tech transfer and team collaboration19:46 - IMO Gold Medal21:42 - Evaluating math proofs22:55 - From specialized models to Deep Think24:22 - Do math skills generalize?25:53 - Generalizing the IMO model27:43 - Democratizing AI science tools30:09 - AI Co-scientist35:17 - An API for science?
Join host Logan Kilpatrick in discussion with some of the minds behind Google's new state-of-the-art image model, Gemini 2.5 Flash. Product and research leads from the Gemini team break down the technology behind its key capabilities, including interleaved generation for complex edits and new approaches to achieving character consistency and pixel-perfect control. With Nicole Brichtova, Kaushik Shivakumar, Mostafa Dehghani and Robert Riachi. Watch on YouTube: Chapters:0:37 - New model introduction1:21 -Demo - Image Editing3:44 - Text rendering capabilities4:44 Beyond human preference evals6:44 - Text rendering as a proxy for quality8:38 - Positive transfer between modalities11:25 - Demo - Multi-turn, context aware image generation13:54 - Pixel-perfect editing and character consistency15:51 - Interleaved image generation17:59 - Specialized vs. native models19:52 - Understanding nuanced prompts20:59 - User feedback shaping model development22:37 - Improvements in character consistency24:17 - More natural looking images from team collaboration26:41 - What’s next for image generation models
Demis Hassabis, CEO of Google DeepMind, sits down with host Logan Kilpatrick. In this episode, learn about the evolution from game-playing AI to today's thinking models, how projects like Genie 3 are building world models to help AI understand reality and why new testing grounds like Kaggle’s Game Arena are needed to evaluate progress on the path to AGI.Watch on YouTube: https://www.youtube.com/watch?v=njDochQ2zHsChapters:00:00 - Intro01:16 - Recent GDM momentum02:07 - Deep Think and agent systems04:11 - Jagged intelligence07:02 - Genie 3 and world models10:21 - Future applications of Genie 313:01 - The need for better benchmarks and Kaggle Game Arena19:03 - Evals beyond games21:47 - Tool use for expanding AI capabilities24:52 - Shift from models to systems27:38 - Roadmap for Genie 3 and the omni model29:25 - The quadrillion token club
Shrestha Basu Mallick, one of the product leads for the Gemini API, joins host Logan Kilpatrick for a deep dive of Gemini Live API, Google’s real-time, multimodal interface for developers. Learn about how native audio alongside new capabilities like proactive audio and async function calling unlocks the unique power of audio as an interface.Watch on YouTube: https://www.youtube.com/watch?v=4xlwlU6h-wM0:00 - Intro1:18 - Live API Overview3:36 - Why audio is a special modality5:07 - Speed vs. precision in audio6:17 - Controllable and promptable TTS8:31 - What developers are building with the Live API11:14 - URL context and async calling features15:02 - Proactive audio and affective dialog16:55 - Addressing developer feedback21:54 - Live API roadmap23:49 - The role of long context24:57 - What’s next for the Live API26:41 - State of the AI audio market30:10 - Advice for developers getting started with the Live API31:16 - Live API demo38:10 - Demo wrap up and closing
Robby Stein, VP of Product for Google Search, joins host Logan Kilpatrick to explore how Search is evolving into a frontier AI product. Their conversation covers the shift from simple keywords to complex, conversational queries, the rise of agentic capabilities that can take action on your behalf, and the vision to help billions of users truly "ask anything." Learn more about the technology behind AI Overviews, AI Mode, Deep Search, and the future of multimodal interaction.Watch on YouTube: https://youtu.be/zUB5A_ezIOUChapters01:07 Search as a Frontier AI Product02:38 Reaching 1.5 Billion Users03:37 What Is AI Mode?04:17 Understanding Query Fan-Out05:18 Balancing Latency and performance with Gemini 2.5 Pro06:51 How Deep Search works09:08 Fine-tuning models for product experience11:24 Shifting user behaviors14:07 The rise of visual search16:52 Speech and conversational AI in Search18:36 Comparing Gemini and Search20:04 Real-time tool use in Search22:52 Evolving the Search interface26:03 Making Search more personal29:15 The agentic future of Search31:15 Agents beyond booking tickets37:11 On-the-fly software creation38:06 Google DeepMind and Search collaboration40:08 What's next for Search
Ani Baddepudi, Gemini Model Behavior Product Lead, joins host Logan Kilpatrick for a deep dive into Gemini's multimodal capabilities. Their conversation explores why Gemini was built as a natively multimodal model from day one, the future of proactive AI assistants, and how we are moving towards a world where "everything is vision." Learn about the differences between video and image understanding and token representations, higher FPS video sampling, and more. Chapters:0:00 - Intro1:12 - Why Gemini is natively multimodal2:23 - The technology behind multimodal models5:15 - Video understanding with Gemini 2.59:25 - Deciding what to build next13:23 - Building new product experiences with multimodal AI17:15 - The vision for proactive assistants24:13 - Improving video usability with variable FPS and frame tokenization27:35 - What’s next for Gemini’s multimodal development31:47 - Deep dive on Gemini’s document understanding capabilities37:56 - The teamwork and collaboration behind Gemini40:56 - What’s next with model behaviorWatch on YouTube: https://www.youtube.com/watch?v=K4vXvaRV0dw
Connie Fan, Product Lead for Gemini's coding capabilities, and Danny Tarlow, Research Lead for Gemini's coding capabilities, join host Logan Kilpatrick for an in-depth discussion on how the team built one of the world's leading AI coding models. Learn more about the early goals that shaped Gemini's approach to code, the rise of 'vibe coding' and its impact on development, strategies for tackling large codebases with long context and agents, and the future of programming languages in the age of AI.Watch on YouTube: https://www.youtube.com/watch?v=jwbG_m-X-gEChapters:0:00 - Intro1:10 - Defining Early Coding Goals6:23 - Ingredients of a Great Coding Model9:28 - Adapting to Developer Workflows11:40 - The Rise of Vibe Coding14:43 - Code as a Reasoning Tool17:20 - Code as a Universal Solver20:47 - Evaluating Coding Models24:30 - Leveraging Internal Googler Feedback26:52 - Winning Over AI Skeptics28:04 - Performance Across Programming Languages33:05 - The Future of Programming Languages36:16 - Strategies for Large Codebases41:06 - Hill Climbing New Benchmarks42:46 - Short-Term Improvements44:42 - Model Style and Taste47:43 - 2.5 Pro’s Breakthrough51:06 - Early AI Coding Experiences56:19 - Specialist vs. Generalist Models
A conversation with Sergey Brin, co-founder of Google and computer scientist working on Gemini, in reaction to a year of progress with Gemini.Watch on YouTube: https://www.youtube.com/watch?v=o7U4DV9Fkc0Chapters0:20 - Initial reactions to I/O2:00 - Focus on Gemini’s core text model4:29 - Native audio in Gemini and Veo 38:34 - Insights from model training runs10:07 - Surprises in current AI developments vs. past expectations14:20 - Evolution of model training16:40 - The future of reasoning and Deep Think20:19 - Google’s startup culture and accelerating AI innovation24:51 - Closing
Learn moreAI Studio: https://aistudio.google.com/Gemini Canvas: https://gemini.google.com/canvasMariner: https://labs.google.com/mariner/Gemini Ultra: https://one.google.com/about/google-a...Jules: https://jules.google/Gemini Diffusion: https://deepmind.google/models/gemini...Flow: https://labs.google/flow/aboutNotebook LM: https://notebooklm.google.com/Stitch: https://stitch.withgoogle.com/Chapters0:59 - I/O Day 1 Recap02:48 - Envisioning I/O 203008:11 - AI for Scientific Breakthroughs09:20 - Veo 3 & Flow7:35 - Gemini Live & the Future of Proactive Assistants20:30 - Gemini in Chrome & Future Apps22:28 - New Gemini Models: DeepThink, Diffusion & 2.5 Flash/Pro Updates27:19 - Developer Momentum & Feedback Loop31:50 - New Developer Products: Jules, Stitch & CodeGen in AI Studio37:44 - Evolving Product Development Process with AI39:23 - Closing
Explore the synergy between long context models and Retrieval Augmented Generation (RAG) in this episode of Release Notes. Join Google DeepMind's Nikolay Savinov as he discusses the importance of large context windows, how they enable Al agents, and what's next in the field.Chapters:0:52 Introduction & defining tokens5:27 Context window importance9:53 RAG vs. Long Context14:19 Scaling beyond 2 million tokens18:41 Long context improvements since 1.5 Pro release23:26 Difficulty of attending to the whole context28:37 Evaluating long context: beyond needle-in-a-haystack33:41 Integrating long context research34:57 Reasoning and long outputs40:54 Tips for using long context48:51 The future of long context: near-perfect recall and cost reduction54:42 The role of infrastructure56:15 Long-context and agents
Tulsee Doshi, Head of Product for Gemini Models joins host Logan Kilpatrick for an in-depth discussion on the latest Gemini 2.5 Pro experimental launch. Gemini 2.5 is a well-rounded, multimodal thinking model, designed to tackle increasingly complex problems. From enhanced reasoning to advanced coding, Gemini 2.5 can create impressive web applications and agentic code applications. Learn about the process of building Gemini 2.5 Pro experimental, the improvements made across the stack, and what’s next for Gemini 2.5. Chapters:0:00 - Introduction1:05 - Gemini 2.5 launch overview3:19 - Academic evals vs. vibe checks6:19 - The jump to 2.57:51 - Coordinating cross-stack improvements11:48 - Role of pre/post-training vs. test-time compute13:21 - Shipping Gemini 2.515:29 - Embedded safety process17:28 - Multimodal reasoning with Gemini 2.518:55 - Benchmark deep dive22:07 - What’s next for Gemini24:49 - Dynamic thinking in Gemini 2.525:37 - The team effort behind the launch Resources:Gemini → https://goo.gle/41Yf72bGemini 2.5 blog post → https://goo.gle/441SHiVExample of Gemini’s 2.5 Pro’s game design skills → https://goo.gle/43vxkq1Demo: Gemini 2.5 Pro Experimental in Google AI Studio → https://goo.gle/4c5RbhE
Dave Citron, Senior Director Product Management, joins host Logan Kilpatrick for an in-depth discussion on the latest Gemini updates and demos. Learn more about Canvas for collaborative content creation, enhanced Deep Research with Thinking Models and Audio Overview and a new personalization feature.0:00 - Introduction0:59 - Recent Gemini app launches2:00 - Introducing Canvas5:12 - Canvas in action8:46 - More Canvas examples12:02 - Enhanced capabilities with Thinking Models15:12 - Deep Research in action20:27 - The future of agentic experiences22:12 Deep Research and Audio Overviews24:11 - Personalization in Gemini app27:50 - Personalization in action29:58 - How personalization works: user data and privacy32:30 -The future of personalization
Jack Rae, Principal Scientist at Google DeepMind, joins host Logan Kilpatrick for an in-depth discussion on the development of Google’s thinking models. Learn more about practical applications of thinking models, the impact of increased 'thinking time' on model performance and the key role of long context.01:14 - Defining Thinking Models03:40 - Use Cases for Thinking Models07:52 - Thinking Time Improves Answers09:57 - Rapid Thinking Progress20:11 - Long Context Is Key27:41 - Tools for Thinking Models29:44 - Incorporating Developer Feedback35:11 - The Strawberry Counting Problem39:15 - Thinking Model Development Timeline42:30 - Towards a GA Thinking Model49:24 - Thinking Models Powering AI Agents54:14 - The Future of AI Model Evals
Tulsee Doshi, Gemini model product lead, joins host Logan Kilpatrick to go behind the scenes of Gemini 2.0, taking a deep dive into the model's multimodal capabilities and native tool use, and Google's approach to shipping experimental models.
Watch on YouTube: https://www.youtube.com/watch?v=L7dw799vu5o
Chapters:
Meet Tulsee Doshi
Gemini's Progress Over the Past Year
Introducing Gemini 2.0
Shipping Experimental Models
Gemini 2.0’s Native Tool Use
Function Calling
Multimodal Agents
Rapid Fire Questions
Logan Kilpatrick sits down with Emanuel Taropa, a key figure in the development of Gemini to delve into the cutting edge of AI. Taropa provides insights into the technical challenges and triumphs of building and deploying large language models, focusing on the recent release of the Flash 8B Gemini model.
Their conversation covers everything from the intricacies of model architecture and training to the practical challenges of shipping AI models at scale, and even speculates on the future of AI.