Episode 43: Tales from 400+ LLM Deployments: Building Reliable AI Agents in Production

Update: 2025-01-16

Description

Hugo speaks with Alex Strick van Linschoten, Machine Learning Engineer at ZenML and creator of a comprehensive LLMOps database documenting over 400 deployments. Alex's extensive research into real-world LLM implementations gives him unique insight into what actually works—and what doesn't—when deploying AI agents in production.

In this episode, we dive into:

The current state of AI agents in production, from successes to common failure modes

Practical lessons learned from analyzing hundreds of real-world LLM deployments

How companies like Anthropic, Klarna, and Dropbox are using patterns like ReAct, RAG, and microservices to build reliable systems

The evolution of LLM capabilities, from expanding context windows to multimodal applications

Why most companies still prefer structured workflows over fully autonomous agents

We also explore real-world case studies of production hurdles, including cascading failures, API misfires, and hallucination challenges. Alex shares concrete strategies for integrating LLMs into your pipelines while maintaining reliability and control.

Whether you're scaling agents or building LLM-powered systems, this episode offers practical insights for navigating the complex landscape of LLMOps in 2025.

LINKS

The podcast livestream on YouTube

The LLMOps database

All blog posts about the database

Anthropic's Building effective agents essay

Alex on LinkedIn

Hugo on twitter

Vanishing Gradients on twitter

Vanishing Gradients on YouTube

Vanishing Gradients on Twitter

Vanishing Gradients on Lu.ma

Comments

In Channel

Episode 62: Practical AI at Work: How Execs and Developers Can Actually Use LLMs

2025-10-3159:04

Episode 61: The AI Agent Reliability Cliff: What Happens When Tools Fail in Production

2025-10-1628:04

Episode 60: 10 Things I Hate About AI Evals with Hamel Husain

2025-09-3001:13:15

Episode 59: Patterns and Anti-Patterns For Building with AI

2025-09-2347:37

Episode 58: Building GenAI Systems That Make Business Decisions with Thomas Wiecki (PyMC Labs)

2025-09-0901:00:45

Episode 57: AI Agents and LLM Judges at Scale: Processing Millions of Documents (Without Breaking the Bank)

2025-08-2941:27

Episode 56: DeepMind Just Dropped Gemma 270M... And Here’s Why It Matters

2025-08-1445:40

Episode 55: From Frittatas to Production LLMs: Breakfast at SciPy

2025-08-1238:08

Episode 54: Scaling AI: From Colab to Clusters — A Practitioner’s Guide to Distributed Training and Inference

2025-07-1841:17

Episode 53: Human-Seeded Evals & Self-Tuning Agents: Samuel Colvin on Shipping Reliable LLMs

2025-07-0844:49

Episode 52: Why Most LLM Products Break at Retrieval (And How to Fix Them)

2025-07-0228:38

Episode 51: Why We Built an MCP Server and What Broke First

2025-06-2647:41

Episode 50: A Field Guide to Rapidly Improving AI Products -- With Hamel Husain

2025-06-1727:42

Episode 49: Why Data and AI Still Break at Scale (and What to Do About It)

2025-06-0501:21:45

Episode 48: HOW TO BENCHMARK AGI WITH GREG KAMRADT

2025-05-2301:04:25

Episode 47: The Great Pacific Garbage Patch of Code Slop with Joe Reis

2025-04-0701:19:12

Episode 46: Software Composition Is the New Vibe Coding

2025-04-0301:08:57

Episode 45: Your AI application is broken. Here’s what to do about it.

2025-02-2001:17:30

Episode 44: The Future of AI Coding Assistants: Who’s Really in Control?

2025-02-0401:34:11

Episode 43: Tales from 400+ LLM Deployments: Building Reliable AI Agents in Production

2025-01-1601:01:03

00:00

Episode 43: Tales from 400+ LLM Deployments: Building Reliable AI Agents in Production

#box-pro-ellipsis-176243360704444{-webkit-line-clamp:2;}Episode 43: Tales from 400+ LLM Deployments: Building Reliable AI Agents in Production

Episode 43: Tales from 400+ LLM Deployments: Building Reliable AI Agents in Production

Hugo Bowne-Anderson

Episode 43: Tales from 400+ LLM Deployments: Building Reliable AI Agents in Production