Chain of Thought

50 Episodes

Reverse

How Intercom Cut $250K/Month by Ditching GPT for Qwen

2026-02-2653:30

Intercom was spending $250K/month on a single summarization task using GPT. Then they replaced it with a fine-tuned 14B parameter Qwen model and saved almost all of it. In this episode, Intercom's Chief AI Officer, Fergal Reid, walks through exactly how they made that call, where their approach has changed over time, and how all of their efforts built their Fin customer service agent. Fergal breaks down how Fin went from 30% to nearly 70% resolution rate and why most of those gains came from surrounding systems (custom re-rankers, retrieval models, query canonicalization), not the core frontier LLM. He explains why higher latency counterintuitively increases resolution rates, how they built a custom re-ranker that outperformed Cohere using ModernBERT, and why he believes vertically integrated AI products will win in the long term.If you're deciding between fine-tuning open-weight models and using frontier APIs in production, you won't find a more detailed decision process walkthrough.🔗 Connect with Fergal: Twitter/X: https://x.com/fergal_reidLinkedIn: https://www.linkedin.com/in/fergalreid/Fin: https://fin.ai/🔗 Connect with Conor:YouTube: https://www.youtube.com/@ConorBronsdonNewsletter: https://conorbronsdon.substack.com/Twitter/X: https://x.com/ConorBronsdonLinkedIn: https://www.linkedin.com/in/conorbronsdon/🔗 More episodes: https://chainofthought.showCHAPTERS0:00 Intro0:46 Why Intercom Completely Reversed Their Fine-Tuning Position8:00 The $250K/Month Summarization Task (Query Canonicalization)11:25 Training Infrastructure: H200s, LoRA to Full SFT, and GRPO14:09 Why Qwen Models Specifically Work for Production18:03 Goodhart's Law: When Benchmarks Lie19:47 A/B Testing AI in Production: Soft vs. Hard Resolutions25:09 The Latency Paradox: Why Slower Responses Get More Resolutions26:33 Why Per-Customer Prompt Branching Is Technical Debt28:51 Sponsor: Galileo29:36 Hiring Scientists, Not Just Engineers32:15 Context Engineering: Intercom's Full RAG Pipeline35:35 Customer Agent, Voice, and What's Next for Fin39:30 Vertical Integration: Can App Companies Outrun the Labs?47:45 When Engineers Laughed at Claude Code52:23 Closing ThoughtsTAGSFergal Reid, Intercom, Fin AI agent, open-weight models, Qwen models, fine-tuning LLMs, post-training, RAG pipeline, customer service AI, GRPO reinforcement learning, A/B testing AI, Claude Code, vertical AI integration, inference cost optimization, context engineering, AI agents, ModernBERT reranker, scaling AI teams, Conor Bronsdon, Chain of Thought

How Block Deployed AI Agents to 12,000 Employees in 8 Weeks w/ MCP | Angie Jones

2026-01-2150:26

How do you deploy AI agents to 12,000 employees in just 8 weeks? How do you do it safely? Angie Jones, VP of Engineering for AI Tools and Enablement at Block, joins the show to share exactly how her team pulled it off.Block (the company behind Square and Cash App) became an early adopter of Model Context Protocol (MCP) and built Goose, their open-source AI agent that's now a reference implementation for the Agentic AI Foundation. Angie shares the challenges they faced, the security guardrails they built, and why letting employees choose their own models was critical to adoption.We also dive into vibe coding (including Angie's experience watching Jack Dorsey vibe code a feature in 2 hours), how non-engineers are building their own tools, and what MCP unlocks when you connect multiple systems together.Chapters:00:00 Introduction02:02 How Block deployed AI agents to 12,000 employees05:04 Challenges with MCP adoption and security at scale07:10 Why Block supports multiple AI models (Claude, GPT, Gemini)08:40 Open source models and local LLM usage09:58 Measuring velocity gains across the organization10:49 Vibe coding: Benefits, risks & Jack Dorsey's 2-hour feature build13:46 Block's contributions to the MCP protocol14:38 MCP in action: Incident management + GitHub workflow demo15:52 Addressing MCP criticism and security concerns18:41 The Agentic AI Foundation announcement (Block, Anthropic, OpenAI, Google, Microsoft)21:46 AI democratization: Non-engineers building MCP servers24:11 How to get started with MCP and prompting tips25:42 Security guardrails for enterprise AI deployment29:25 Tool annotations and human-in-the-loop controls30:22 OAuth and authentication in Goose32:11 Use cases: Engineering, data analysis, fraud detection35:22 Goose in Slack: Bug detection and PR creation in 5 minutes38:05 Goose vs Claude Code: Open source, model-agnostic philosophy38:17 Live Demo: Council of Minds MCP server (9-persona debate)45:52 What's next for Goose: IDE support, ACP, and the $100K contributor grant47:57 Where to get started with GooseConnect with Angie on LinkedIn: https://www.linkedin.com/in/angiejones/Angie's Website: https://angiejones.tech/Follow Angie on X: https://x.com/techgirl1908Goose GitHub: https://github.com/block/gooseConnect with Conor on LinkedIn: https://www.linkedin.com/in/conorbronsdon/Follow Conor on X: https://x.com/conorbronsdonModular: https://www.modular.com/Presented By: Galileo AIDownload Galileo's Mastering Multi-Agent Systems for free here: https://galileo.ai/mastering-multi-agent-systemsTopics Covered:- How Block deployed Goose to all 12,000 employees- Building enterprise security guardrails for AI agents- Model Context Protocol (MCP) deep dive- Vibe coding benefits and risks- The Agentic AI Foundation (Block, Anthropic, OpenAI, Google, Microsoft, AWS)- MCP sampling and the Council of Minds demo- OAuth authentication for MCP servers- Goose vs Claude Code and other AI coding tools- Non-engineers building AI tools- Fraud detection with AI agents- Goose in Slack for real-time bug fixing

Gemini 3 & Robot Dogs: Inside Google DeepMind's AI Experiments | Paige Bailey

2026-01-1450:52

Google DeepMind is reshaping the AI landscape with an unprecedented wave of releases—from Gemini 3 to robotics and even data centers in space. Paige Bailey, AI Developer Relations Lead at Google DeepMind, joins us to break down the full Google AI ecosystem. From her unique journey as a geophysicist-turned-AI-leader who helped ship GitHub Copilot, to now running developer experience for DeepMind's entire platform, Paige offers an insider's view of how Google is thinking about the future of AI.The conversation covers the practical differences between Gemini 3 Pro and Flash, when to use the open-source Gemma models, and how tools like Anti-Gravity IDE, Jules, and Gemini CLI fit into developer workflows. Paige also demonstrates Space Math Academy—a gamified NASA curriculum she built using AI Studio, Colab, and Anti-Gravity—showing how modern AI tools enable rapid prototyping. The discussion then ventures into AI's physical frontier: robotics powered by Gemini on Raspberry Pi, Google's robotics trusted tester program, and the ambitious Project Suncatcher exploring data centers in space.00:00 Introduction01:30 Paige's Background & Connection to Modular02:29 Gemini Integration Across Google Products03:04 Jules, Gemini CLI & Anti-Gravity IDE Overview03:48 Gemini 3 Flash vs Pro: Live Demo & Pricing06:10 Choosing the Right Gemini Model09:42 Google's Hardware Advantage: TPUs & JAX10:16 TensorFlow History & Evolution to JAX11:45 NeurIPS 2025 & Google's Research Culture14:40 Google Brain to DeepMind: The Merger Story15:24 Palm II to Gemini: Scaling from 40 People18:42 Gemma Open Source Models20:46 Anti-Gravity IDE Deep Dive23:53 MCP Protocol & Chrome DevTools Integration26:57 Gemini CLI in Google Colab28:00 Image Generation & AI Studio Traffic Spikes28:46 Space Math Academy: Gamified NASA Curriculum31:31 Vibe Coding: Building with AI Studio & Anti-Gravity36:02 AI From Bits to Atoms: The Robotics Frontier36:40 Stanford Puppers: Gemini on Raspberry Pi Robots38:35 Google's Robotics Trusted Tester Program40:59 AI in Scientific Research & Automation42:25 Project Suncatcher: Data Centers in Space45:00 Sustainable AI Infrastructure47:14 Non-Dystopian Sci-Fi Futures47:48 Closing Thoughts & Resources- Connect with Paige on LinkedIn: https://www.linkedin.com/in/dynamicwebpaige/- Follow Paige on X: https://x.com/DynamicWebPaige- Paige's Website: https://webpaige.dev/- Google DeepMind: https://deepmind.google/- AI Studio: https://ai.google.devConnect with our host Conor Bronsdon:- Substack – https://conorbronsdon.substack.com/ - LinkedIn https://www.linkedin.com/in/conorbronsdon/Presented By: Galileo.aiDownload Galileo's Mastering Multi-Agent Systems for free here!: https://galileo.ai/mastering-multi-agent-systemsTopics Covered:- Gemini 3 Pro vs Flash comparison (pricing, speed, capabilities)- When to use Gemma open-source models- Anti-Gravity IDE, Jules, and Gemini CLI workflows- Google's TPU hardware advantage- History of TensorFlow, JAX, and Google Brain- Space Math Academy demo (gamified education)- AI-powered robotics (Stanford Puppers on Raspberry Pi)- Project Suncatcher (orbital data centers)

Explaining Eval Engineering | Galileo's Vikram Chatterji

2025-12-1937:14

You've heard of evaluations—but eval engineering is the difference between AI that ships and AI that's stuck in prototype.Most teams still treat evals like unit tests: write them once, check a box, move on. But when you're deploying agents that make real decisions, touch real customers, and cost real money, those one-time tests don't cut it. The companies actually shipping production AI at scale have figured out something different—they've turned evaluations into infrastructure, into IP, into the layer where domain expertise becomes executable governance.Vikram Chatterji, CEO and Co-founder of Galileo, returns to Chain of Thought to break down eval engineering: what it is, why it's becoming a dedicated discipline, and what it takes to actually make it work. Vikram shares why generic evals are plateauing, how continuous learning loops drive accuracy, and why he predicts "eval engineer" will become as common a role as "prompt engineer" once was.In this conversation, Conor and Vikram explore:Why treating evals as infrastructure—not checkboxes—separates production AI from prototypesThe plateau problem: why generic LLM-as-a-judge metrics can't break 90% accuracyHow continuous human feedback loops improve eval precision over timeThe emerging "eval engineer" role and what the job actually looks likeWhy 60-70% of AI engineers' time is already spent on evalsWhat multi-agent systems mean for the future of evaluationVikram's framework for baking trust AND control into agentic applicationsPlus: Conor shares news about his move to Modular and what it means for Chain of Thought going forward.Chapters:00:00 – Introduction: Why Evals Are Becoming IP01:37 – What Is Eval Engineering?04:24 – The Eval Engineering Course for Developers05:24 – Generic Evals Are Plateauing08:21 – Continuous Learning and Human Feedback11:01 – Human Feedback Loops and Eval Calibration13:37 – The Emerging Eval Engineer Role16:15 – What Production AI Teams Actually Spend Time On18:52 – Customer Impact and Lessons Learned24:28 – Multi-Agent Systems and the Future of Evals30:27 – MCP, A2A Protocols, and Agent Authentication33:23 – The Eval Engineer Role: Product-Minded + Technical34:53 – Final Thoughts: Trust, Control, and What's NextConnect with Conor Bronsdon:Substack – https://conorbronsdon.substack.com/LinkedIn – https://www.linkedin.com/in/conorbronsdon/X (Twitter) – https://x.com/ConorBronsdonLearn more about Eval Engineering:⁠https://galileo.ai/evalengineering⁠Connect with Vikram Chatterji:LinkedIn – ⁠https://www.linkedin.com/in/vikram-chatterji/⁠

Debunking AI's Environmental Panic | Andy Masley

2025-11-2659:02

AI is destroying the planet—or so we've been told. This week on Chain of Thought, we tackle one of the most persistent and misleading narratives in the AI conversation.Andy Masley, Director of Effective Altruism DC, joins host Conor Bronsdon to fact-check the absurd AI environmental claims you've heard at parties, in articles, and even in bestselling books. Andy recently went viral for discovering what he calls "the single most egregious math mistake" he's ever seen in a book—a data center water usage calculation in Karen Hao's NYT Bestseller, Empire of AI, that was off by a factor of 4,500.In this conversation, Andy and Conor break down the myths around AI’s water and energy usage and explore:The viral Empire of AI error and what it reveals about the broader debateWhy most AI water usage statistics are misleading or flat-out wrongHow one ChatGPT prompt represents just 1/150,000th of your daily emissionsTrade-offs around data center cooling + decision makingWhy "tribal thinking" about AI is distorting environmental activismWhere AI might actually help the climate through deep learning optimizationIf you've ever felt guilty about using AI tools, been cornered at a party about AI's environmental impact, or simply want to understand what the data actually says, this episode, and Andy’s deep dive articles, arm you with the facts.Chapters:00:00 – Introduction: The Party Guilt Problem01:54 – Andy's Background and What Sparked This Work03:50 – The 4,500x Error in Empire of AI06:39 – Breaking Down the Math: Liters vs. Cubic Meters10:39 – The Unintended Consequence: Air Cooling vs. Water Cooling12:51 – Karen Hao's Response and What's Still Missing19:08 – Why Environmentalists Should Focus Elsewhere21:41 – The Danger of Tribal Thinking About AI25:49 – What Is Effective Altruism (And Why People Attack It)29:15 – EA, AI Risk, and P(doom)34:31 – Why Misinformation Hurts Your Own Side37:39 – Using ChatGPT Is Not Bad for the Environment42:14 – The Party Rebuttal: Practical Comparisons45:23 – Water Use Reality: 1/800,000th of Your Daily Footprint48:27 – The Personal Carbon Footprint Distraction53:38 – Data Centers: Efficiency vs. Whether to Build55:13 – AI's Net Climate Impact: The Positive Case59:34 – Deep Learning, Smart Grids, and Climate Optimization1:03:45 – Final ThoughtsKey referencesIEA Study: AI and climate change - https://www.iea.org/reports/energy-and-ai/ai-and-climate-change#abstract Nature: https://www.nature.com/articles/s44168-025-00252-3 The Empire of AI Error: https://andymasley.substack.com/p/empire-of-ai-is-wildly-misleading Using ChatGPT isn’t bad for the environment: https://andymasley.substack.com/p/a-short-summary-of-my-argument-thathttps://andymasley.substack.com/p/a-cheat-sheet-for-conversations-about Connect with Andy Masley: Substack – https://andymasley.substack.com/X (Twitter) – https://x.com/AndyMasleyConnect with Conor Bronsdon: Substack – https://conorbronsdon.substack.com/LinkedIn – https://www.linkedin.com/in/conorbronsdon/X (Twitter) – https://x.com/ConorBronsdon

The Critical Infrastructure Behind the AI Boom | Cisco CPO Jeetu Patel

2025-11-1901:18:10

AI is accelerating at a breakneck pace, but model quality isn’t the only constraint we face.. There are major infrastructure requirements, energy needs, security, and data pipelines to run AI at scale. This week on Chain of Thought, Cisco’s President and Chief Product Officer Jeetu Patel joins host Conor Bronsdon to reveal what it actually takes to build the critical foundation for the AI era.Jeetu breaks down the three bottlenecks he sees holding AI back today: • Infrastructure limits: not enough power, compute, or data center capacity • A trust deficit: non-deterministic models powering systems that must be predictable • A widening data gap: human-generated data plateauing while machine data explodesJeetu then shares how Cisco is tackling these challenges through secure AI factories, edge inference, open multi-model architectures, and global partnerships with Nvidia, G42, and sovereign cloud providers. Jeetu also explains why he thinks enterprises will soon rely on thousands of specialized models — not just one — and how routing, latency, cost, and security shape this new landscape.Conor and Jeetu also explore high-performance leadership and team culture, discussing building high-trust teams, embracing constructive tension, staying vigilant in moments of success, and the personal experiences that shaped Jeetu’s approach to innovation and resilience.If you want a clearer picture of the global AI infrastructure race, how high-level leaders are thinking about the future, and what it all means for enterprises, developers, and the future of work, this conversation is essential.Chapters:00:00 – Welcome to Chain of Thought0:48 - AI and Jobs: Beyond the Hype6:15 - The Real AI Opportunity: Original Insights10:00 - Three Critical AI Constraints: Infrastructure, Trust, and Data16:27 - Cisco's AI Strategy and Platform Approach19:18 - Edge Computing and Model Innovation22:06 - Strategic Partnerships: Nvidia, G42, and the Middle East29:18 - Acquisition Strategy: Platform Over Products32:03 - Power and Infrastructure Challenges36:06 - Building Trust Across Global Partnerships38:03 - US vs. China: The AI Infrastructure Race40:33 - America's Venture Capital Advantage42:06 - Acquisition Philosophy: Strategy First45:45 - Defining Cisco's True North48:06 - Mission-Driven Innovation Culture50:15 - Hiring for Hunger, Curiosity, and Clarity56:27 - The Power of Constructive Conflict1:00:00 - Career Lessons: Continuous Learning1:02:24 - The Email Question1:04:12 - Joe Tucci's Four-Column Exercise1:08:15 - Building High-Trust Teams1:10:12 - The Five Dysfunctions Framework1:12:09 - Leading with Vulnerability1:16:18 - Closing Thoughts and Where to ConnectConnect with Jeetu Patel:LinkedIn – https://www.linkedin.com/in/jeetupatel/ X(twitter) – https://x.com/jpatel41Cisco - https://www.cisco.com/Connect with ConorBronsdon Substack – https://conorbronsdon.substack.com/ LinkedIn – https://www.linkedin.com/in/conorbronsdon/X (twitter) – https://x.com/ConorBronsdon

Beyond Transformers: Maxime Labonne on Post-Training, Edge AI, and the Liquid Foundation Model Breakthrough

2025-11-1252:30

The transformer architecture has dominated AI since 2017, but it’s not the only approach to building LLMs - and new architectures are bringing LLMs to edge devicesMaxime Labonne, Head of Post-Training at Liquid AI and creator of the 67,000+ star LLM Course, joins Conor Bronsdon to challenge the AI architecture status quo. Liquid AI’s hybrid architecture, combining transformers with convolutional layers, delivers faster inference, lower latency, and dramatically smaller footprints without sacrificing capability. This alternative architectural philosophy creates models that run effectively on phones and laptops without compromise.But reimagined architecture is only half the story. Maxime unpacks the post-training reality most teams struggle with: challenges and opportunities of synthetic data, how to balance helpfulness against safety, Liquid AI’s approach to evals, RAG architectural approaches, how he sees AI on edge devices evolving, hard won lessons from shipping LFM1 through 2, and much more. If you're tired of surface-level AI takes and want to understand the architectural and engineering decisions behind production LLMs from someone building them in the trenches, this is your episode.Connect with ⁨Maxime Labonne⁩ :LinkedIn – https://www.linkedin.com/in/maxime-labonne/ X (Twitter) – @maximelabonneAbout Maxime – https://mlabonne.github.io/blog/about.html HuggingFace – https://huggingface.co/mlabonne The LLM Course – https://github.com/mlabonne/llm-course Liquid AI – https://liquid.ai Connect with ⁨Conor Bronsdon⁩ :X (twitter) – @conorbronsdonSubstack – https://conorbronsdon.substack.com/ LinkedIn – https://www.linkedin.com/in/conorbronsdon/00:00 Intro — Welcome to Chain of Thought 00:27 Guest Intro — Maxime Labonne of Liquid AI 02:21 The Hybrid LLM Architecture Explained 06:30 Why Bigger Models Aren’t Always Better 11:10 Convolution + Transformers: A New Approach to Efficiency 18:00 Running LLMs on Laptops and Wearables 22:20 Post-Training as the Real Moat 25:45 Synthetic Data and Reliability in Model Refinement 32:30 Evaluating AI in the Real World 38:11 Benchmarks vs Functional Evals 43:05 The Future of Edge-Native Intelligence 48:10 Closing Thoughts & Where to Find Maxime Online

Architecting AI Agents: The Shift from Models to Systems | Aishwarya Srinivasan, Fireworks AI Head of AI Developer Relations

2025-10-0853:25

Most AI agents are built backwards, starting with models instead of system architecture.Aishwarya Srinivasan, Head of AI Developer Relations at Fireworks AI, joins host Conor Bronsdon to explain the shift required to build reliable agents: stop treating them as model problems and start architecting them as complete software systems. Benchmarks alone won't save you. Aish breaks down the evolution from prompt engineering to context engineering, revealing how production agents demand careful orchestration of multiple models, memory systems, and tool calls. She shares battle-tested insights on evaluation-driven development, the rise of open source models like DeepSeek v3, and practical strategies for managing autonomy with human-in-the-loop systems. The conversation addresses critical production challenges, ranging from LLM-as-judge techniques to navigating compliance in regulated environments.Connect with Aishwarya Srinivasan:LinkedIn: https://www.linkedin.com/in/aishwarya-srinivasan/Instagram: https://www.instagram.com/the.datascience.gal/Connect with Conor: https://www.linkedin.com/in/conorbronsdon/00:00 Intro — Welcome to Chain of Thought00:22 Guest Intro — Ash Srinivasan of Fireworks AI02:37 The Challenge of Responsible AI05:44 The Hidden Risks of Reward Hacking07:22 From Prompt to Context Engineering10:14 Data Quality and Human Feedback14:43 Quantifying Trust and Observability20:27 Evaluation-Driven Development30:10 Open Source Models vs. Proprietary Systems34:56 Gaps in the Open-Source AI Stack38:45 When to Use Different Models45:36 Governance and Compliance in AI Systems50:11 The Future of AI Builders56:00 Closing Thoughts & Follow Ash OnlineFollow the hostsFollow⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Atin⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Follow⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Conor⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Follow⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Vikram⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Follow⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ ⁠⁠⁠⁠Yash⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠

The accidental algorithm: Melisa Russak, AI research scientist at WRITER

2025-10-0121:09

This week, we're doing something special and sharing an episode from another podcast we love: The Humans of AI by our friends at Writer. We're huge fans of their work, and you might remember Writer's CEO, May Habib, from the inaugural episode of our own show.From The Humans of AI:Learn how Melisa Russak, lead research scientist at WRITER, stumbled upon fundamental machine learning algorithms, completely unaware of existing research — twice. Her story reveals the power of approaching problems with fresh eyes and the innovative breakthroughs that can occur when constraints become catalysts for creativity.Melisa explores the intersection of curiosity-driven research, accidental discovery, and systematic innovation, offering valuable insights into how WRITER is pushing the boundaries of enterprise AI. Tune in to learn how her journey from a math teacher in China to a pioneer in AI research illuminates the future of technological advancement.Follow the hostsFollow⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Atin⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Follow⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Conor⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Follow⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Vikram⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Follow⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ ⁠⁠⁠⁠Yash⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Follow Today's Guest(s)Check out Writer’s YouTube channel to watch the full interviews. Learn more about WRITER at writer.com. Follow Melisa on LinkedInFollow May on LinkedInCheck out Galileo⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Try Galileo⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Agent Leaderboard

If Code Generation is Solved What's Next? | Graphite’s Greg Foster

2025-09-2454:39

The incredible velocity of AI coding tools has shifted the critical bottleneck in software development from code generation to code reviews. Greg Foster, Co-Founder & CTO of Graphite, joins the conversation to explore this new reality, outlining the three waves of AI that are leading to autonomous agents spawning pull requests in the background. He argues that as AI automates the "inner loop" of writing code, the human-centric "outer loop"—reviewing, merging, and deploying—is now under immense pressure, demanding a complete rethinking of our tools and processes.The conversation then gets tactical, with Greg detailing how a technique called "stacking" can break down large code changes into manageable units for both humans and AI. He also identifies an emerging hiring gap where experienced engineers with strong architectural context are becoming "lethal" with AI tools. This episode is an essential guide to navigating the new bottlenecks in software development and understanding the skills that will define the next generation of high-impact engineers.Follow the hostsFollow⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Atin⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Follow⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Conor⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Follow⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Vikram⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Follow⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ ⁠⁠⁠⁠Yash⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Follow Today's Guest(s)Connect with Greg on LinkedInFollow Greg on XGraphite Website: graphite.devCheck out Galileo⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Try Galileo⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Agent Leaderboard

Vercel's Playbook for AI Agents: From Vibe Check to Production | Malte Ubl

2025-09-1054:24

What’s the first step to building an enterprise-grade AI tool? Malte Ubl, CTO of Vercel, joins us this week to share Vercel’s playbook for agents, explaining how agents are a new type of software for solving flexible tasks. He shares how Vercel's developer-first ecosystem, including tools like the AI SDK and AI Gateway, is designed to help teams move from a quick proof-of-concept to a trusted, production-ready application.Malte explores the practicalities of production AI, from the importance of eval-driven development to debugging chaotic agents with robust tracing. He offers a critical lesson on security, explaining why prompt injection requires a totally different solution - tool constraint - than traditional threats like SQL injection. This episode is a deep dive into the infrastructure and mindset, from sandboxes to specialized SLMs, required to build the next generation of AI tools.Follow the hostsFollow⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Atin⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Follow⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Conor⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Follow⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Vikram⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Follow⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ ⁠⁠⁠⁠Yash⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Follow Today's Guest(s)Connect with Malte on LinkedInFollow Malte on X (formerly Twitter)Learn more about VercelCheck out Galileo⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Try Galileo⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Agent Leaderboard

From Demo to Defensibility: How to Build an AI Business that Lasts | Aurimas Griciūnas

2025-08-2751:46

The technological moat is eroding in the AI era, what new factors separate a successful startup from the rest?Aurimas Griciūnas, CEO of SwirlAI, joins the show to break down the realities of building in this new landscape. Startup success now hinges on speed, strong financial backing, or immediate distribution. Aurimas warns against the critical mistake of prioritizing shiny tools over fundamental engineering and the market gaps this creates.Discover the new moats for AI companies, built on a culture of relentless execution, tight feedback loops, and the surprising skills that define today's most valuable engineers.The episode also looks to the future, with bold predictions about a slowdown in LLM leaps and the coming impact of coding agents and self-improving systems.Follow the hostsFollow⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Atin⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Follow⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Conor⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Follow⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Vikram⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Follow⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ ⁠⁠⁠⁠Yash⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Follow Today's Guest(s)Connect with Aurimas on⁠ ⁠⁠LinkedIn⁠Aurimas' Course: ⁠End-to-End AI Engineering BootcampCheck out Galileo⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Try Galileo⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Agent Leaderboard

Mindset Over Metrics: How to Approach AI Engineering | Hamel Husain

2025-08-2042:09

As we enter the era of the AI engineer, the biggest challenge isn't technical - it's a shift in mindset. Hamel Husain, a leading AI consultant and luminary in the eval space, joins the podcast to explore the skills and processes needed to build reliable AI. Hamel explains why many teams relying on vanity dashboards and a "buffet of metrics" experience a false sense of security, which is no substitute for customized evals tailored to domain-specific risks. The solution? A disciplined process of error analysis, grounded in manually looking at data to identify real-world failures This discussion is an essential guide to building the continuous learning loops and "experimentation mindset" required to take AI products from prototype to production with confidence. Listen to learn the playbook for building AI reliability, and derive qualitative insights from log data to build customized quantitative guardrails. Follow the hostsFollow⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Atin⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Follow⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Conor⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Follow⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Vikram⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Follow⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ ⁠⁠⁠⁠Yash⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Follow Today's Guest(s)Connect with Hamel on LinkedInFollow Hamel on X/TwitterCheck out his blog: hamel.devCheck out Galileo⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Try Galileo⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Agent Leaderboard

How AI Velocity is Rewriting the Rules for Engineering Leaders | ChatPRD's Claire Vo

2025-08-1342:55

What if your next competitor is not a startup, but a solo builder on a side project shipping features faster than your entire team? For Claire Vo, that's not a hypothetical. As the founder of ChatPRD, formerly the Chief Product and Technology Officer at LaunchDarkly, and host of the How I AI podcast, she has a unique vantage point on the driving forces behind a new blueprint for success.She argues that AI accountability must be driven from the top by an "AI czar" and reveals how a culture of experimentation is the key to overcoming organizational hesitancy. Drawing from her experience as a solo founder, she warns that for incumbents, the cost of moving slowly is the biggest threat and details how AI can finally be used to tackle legacy codebases. The conversation closes with bold predictions on the rise of the "super IC" - who can achieve top-tier impact and salary without managing a team - and the death of product management. Follow the hostsFollow⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Atin⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Follow⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Conor⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Follow⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Vikram⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Follow⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ ⁠⁠⁠⁠Yash⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Follow Today's Guest(s)Connect with Claire on LinkedInFollow Claire on X/TwitterClaire’s podcast How I AICheck out Galileo⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Try Galileo⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Agent Leaderboard

Building an AI-Native Startup | GrowthX's Marcel Santilli

2025-08-0623:53

How do you build an AI-native company to a $7M run rate in just six months?According to Marcel Santilli, Founder and CEO of GrowthX, the secret isn't chasing the next frontier model, it's mastering the "messy middle." Drawing on his deep experience at Scale AI and Deepgram, Marcel joins host Conor Bronsdon to share his framework for building durable, customer-obsessed businesses.Marcel argues that the most critical skills for the AI era aren't technical but philosophical: first-principles thinking and the art of delegation.Tune in to learn why GrowthX first focused on services to codify expert work, how AI can augment human talent instead of replacing it, and why speed and brand are a startup's greatest competitive advantages. This conversation offers a clear playbook for building a resilient company by prioritizing culture and relentless shipping.Follow the hostsFollow⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Atin⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Follow⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Conor⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Follow⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Vikram⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Follow⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ ⁠⁠⁠⁠Yash⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Follow Today's Guest(s)Connect with Marcel on LinkedInFollow Marcel on X (formerly Twitter)Learn more about GrowthXCheck out Galileo⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Try Galileo⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Agent Leaderboard

Can AI Fix Healthcare? | Corti's Andreas Cleve

2025-07-3047:11

AI isn't just changing healthcare; it's providing the essential help needed to unlock a trillion-dollar opportunity for better care.Andreas Cleve, CEO & Co-founder of Corti, steps in to shed light on AI's immense, yet often misunderstood, transformative potential in this high-stakes environment. Andreas refutes the narrative of healthcare being slow adopters, emphasizing its high bar for trustworthy technology and its constant embrace of new tools. He reveals how purpose-built AI models are already alleviating the "pajama time" burden of documentation for clinicians, enabling faster and more accurate assessments in various specializations. This quiet, impactful adoption is seeing companies grow "like weeds" beyond common expectations.The conversation addresses how AI can tackle the looming global shortage of 10 million healthcare professionals by 2030, reallocating a trillion dollars worth of administrative work back into care. Andreas details Corti’s approach to building invisible, reliable AI through rigorous, compliance-first evaluation, ensuring accuracy and efficiency in real-time. He emphasizes that AI's true role is not replacement, but augmentation, empowering professionals to deliver more care, attract talent, and drive organizational growth.Follow the hostsFollow⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Atin⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Follow⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Conor⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Follow⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Vikram⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Follow⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ ⁠⁠⁠⁠Yash⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Follow Today's Guest(s)LinkedIn: linkedin.com/in/andreascleveX (formerly Twitter): andreascleveCorti Website: corti.aiCheck out Galileo⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Try Galileo⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Agent Leaderboard

Mastering Multi-Agent Systems | MongoDB’s Mikiko Chandrasekhar

2025-07-2340:23

AI agents offer unprecedented power, but mastering agent reliability is the ultimate challenge for agentic systems to actually work in production.Mikiko Chandrashekar, Staff Developer Advocate at MongoDB, whose background spans the entire data-to-AI pipeline, unveils MongoDB's vision as the memory store for agents, supporting complex multi-agent systems from data storage and vector search to debugging chat logs. She highlights how MongoDB, reinforced by the acquisition of Voyage, empowers developers to build production-scale agents across various industries, from solo projects to major enterprises. This robust data layer is foundational to ensure agent performance and improve the end user experience.Mikiko advocates for treating agents as software products, applying rigorous engineering best practices to ensure reliability, even for non-deterministic systems. She details MongoDB's unique position to balance GPU/CPU loads and manage data for performance and observability, including Galileo's integrations. The conversation emphasizes the profound need to rethink observability, evaluations, and guardrails in the era of agents, showcasing Galileo's family of small language models for real-time guardrailing, Luna-2, and Insights Engine for automated failure analysis. Discover how building trustworthiness through systematic evaluation, beyond just "vibe checks," is essential for AI agents to scale and deliver value in high-stakes use cases.Follow the hostsFollow⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Atin⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Follow⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Conor⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Follow⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Vikram⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Follow⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ ⁠⁠⁠⁠Yash⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Follow Today's Guest(s)Connect with Mikiko on LinkedInFollow Mikiko on X/TwitterExplore Mikiko's YouTube channelCheck out Mikiko's ⁠SubstackConnect with MongoDB on LinkedInConnect with MongoDB on YouTubeCheck out Galileo⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Try Galileo⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Agent Leaderboard

The AI Agent Trust Gap: Bridging Risk to Reliability | Elastic’s Philipp Krenn

2025-07-1644:11

The age of ubiquitous AI agents is here, bringing immense potential - and unprecedented risk.Hosts Conor Bronsdon and Vikram Chatterji open the episode by discussing the urgent need for building trust and reliability into next-generation AI agents. Vikram unveils Galileo's free AI reliability platform for agents, featuring Luna 2 SLMs for real-time guardrails and its Insights Engine for automatic failure mode analysis. This platform enables cost-effective, low-latency production evaluations, significantly transforming debugging. Achieving trustworthy AI agents demands rigorous testing, continuous feedback, and robust guardrailing—complex challenges requiring powerful solutions from partners like Elastic.Conor welcomes Philipp Krenn, Director of Developer Relations at Elastic, to discuss their collaboration in ensuring AI agent reliability, including how Elastic leverages Galileo's platform for evaluation. Philipp details Elastic's evolution from a search powerhouse to a key AI enabler, transforming data access with Retrieval-Augmented Generation (RAG) and new interaction modes. He discusses Elastic's investment in SLMs for efficient re-ranking and embeddings, emphasizing robust evaluation and observability for production. This collaborative effort aims to equip developers to build reliable, high-performing AI systems for every enterprise.Chapters:00:00 Introduction 01:09 Galileo's AI Reliability Platform01:43 Challenges in AI Agent Reliability06:17 Insights Engine and Its Importance11:00 Luna 2: Small Language Models14:42 Custom Metrics and Agent Leaderboard19:16 Galileo's Integrations and Partnerships21:04 Philipp Krenn from Elastic24:47 Optimizing LLM Responses 25:41 Galileo and Elastic: A Powerful Partnership28:20 Challenges in AI Production and Trust30:02 Guardrails and Reliability in AI Systems32:17 The Future of AI in Customer InteractionFollow the hostsFollow⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Atin⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Follow⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Conor⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Follow⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Vikram⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Follow⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ ⁠⁠⁠⁠Yash⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Follow Today's Guest(s)Connect with Philipp on LinkedInLearn more about ElasticCheck out Galileo⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Try Galileo⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Agent Leaderboard

Architecting Reliable Agentic AI | Cisco’s Giovanna Carofiglio on the AGNTCY Collective

2025-07-0941:02

The Internet of Agents is rapidly taking shape, necessitating innovative foundational standards, protocols, and evaluation methods for its success.Recorded at Cisco's office in San Jose, we welcome Giovanna Carofiglio, Distinguished Engineer and Senior Director at Outshift by Cisco. As a leader of the AGNTCY Collective (an open-source initiative by Cisco, Galileo, LangChain, and many other participating companies), Giovanna outlines the vision for agents to collaborate seamlessly across the enterprise and the internet. She details the collective's pillars, from agent discovery and deployment using new agentic protocols like Slim, to ensuring a secure, low-latency communication transport layer. This groundbreaking work aims to make distributed agentic communication a reality.The conversation then explores the critical role of observability and evaluation in building trustworthy agent applications, including defining an interoperable standard schema for communications. Giovanna highlights the complex challenges of scaling agents to thousands or millions, emphasizing the need for robust security (agent identity with OSF schema) and predictable agent behavior through extensive testing and characterization. She distinguishes between protocols like MCP (agent-to-tool) and A2A (agent-to-agent), advocating for open standards and underlying transport layers akin to TCP. Chapters:00:00 Introduction01:00 Overview of Agent Interoperability02:20 What is AGNTCY03:45 Agent Discovery and Composition04:38 Agent Protocols and Communication05:45 Observability and Evaluation07:00 Metrics and Standards for Agents09:45 Challenges in Agent Evaluation14:15 Low Latency and Active Evaluation23:34 Synthetic Data and Ground Truth25:07 Interoperable Agent Schema26:37 MCP & A2A30:17 Future of Agent Communication32:03 Security and Agent Identity34:37 Collaboration and Community Involvement38:28 Conclusion Follow the hostsFollow⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Atin⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Follow⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Conor⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Follow⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Vikram⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Follow⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ ⁠⁠⁠⁠Yash⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Follow Today's Guest(s)AGNTCY Collective: agntcy.orgConnect with Giovanna on LinkedInLearn more about Outshift: outshift.cisco.comCheck out Galileo⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Try Galileo⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Agent Leaderboard

Taste Is The New Moat | Intangible CEO on Brand, Distribution, and Winning in AI

2025-07-0253:01

When AI makes creating content and code nearly free, how do you stand out? Differentiation now hinges on two things: unique taste and effective distribution.This week, Bharat Vasan, founder & CEO at Intangible and a "recovering VC," explains why the AI landscape compelled him to return to founding. He sees AI sparking a new creative revolution, similar to the early internet, that makes it easier than ever to bring ideas to life. The conversation delivers essential advice for founders, revealing why relentless shipping is the ultimate clarifier for a business and why resilience, not just intelligence, is the key to survival.Drawing from his experience on both sides of the venture table, Bharat breaks down the brutally competitive VC landscape and shares Intangible's mission: to simplify 3D creative tools with AI, finally bridging the gap between human vision and machine power. Listeners will gain insights on company building, brand strategy, and why customer obsession is the ultimate moat in the AI age.Chapters:00:00 Introduction 00:45 From Founder to VC and Back03:17 Human Creativity in the Age of AI07:50 The Role of Taste and Distribution11:49 Building a Brand in the AI Era16:17 The Venture Capital Landscape for AI Startups20:11 Advice for Founders in the AI Boom23:55 Incumbents vs. Startups27:10 The New Generation of Innovators29:19 Pirate Mentality in Startups30:00 Building a Brand36:28 Shipping and Resilience41:49 Customer Obsession46:58 The Vision for Intangible51:52 ConclusionFollow the hostsFollow⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Atin⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Follow⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Conor⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Follow⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Vikram⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Follow⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ ⁠⁠⁠⁠Yash⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Follow Today's Guest(s)Connect with Bharat on LinkedIn.Follow Bharat on X.Learn more about Intangible at intangible.ai.Check out Galileo⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Try Galileo⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Agent Leaderboard

#box-pro-ellipsis-177221259935569{-webkit-line-clamp:2;}Chain of Thought