Episode 54: Scaling AI: From Colab to Clusters — A Practitioner’s Guide to Distributed Training and Inference

Update: 2025-07-18

Description

Colab is cozy. But production won’t fit on a single GPU.

Zach Mueller leads Accelerate at Hugging Face and spends his days helping people go from solo scripts to scalable systems. In this episode, he joins me to demystify distributed training and inference — not just for research labs, but for any ML engineer trying to ship real software.

We talk through:

• From Colab to clusters: why scaling isn’t just about training massive models, but serving agents, handling load, and speeding up iteration

• Zero-to-two GPUs: how to get started without Kubernetes, Slurm, or a PhD in networking

• Scaling tradeoffs: when to care about interconnects, which infra bottlenecks actually matter, and how to avoid chasing performance ghosts

• The GPU middle class: strategies for training and serving on a shoestring, with just a few cards or modest credits

• Local experiments, global impact: why learning distributed systems—even just a little—can set you apart as an engineer

If you’ve ever stared at a Hugging Face training script and wondered how to run it on something more than your laptop: this one’s for you.

LINKS

Zach on LinkedIn

Hugo's blog post on Stop Buliding AI Agents

Upcoming Events on Luma

Hugo's recent newsletter about upcoming events and more!

🎓 Learn more:

Hugo's course: Building LLM Applications for Data Scientists and Software Engineers — https://maven.com/s/course/d56067f338

Zach's course (45% off for VG listeners!): Scratch to Scale: Large-Scale Training in the Modern World -- https://maven.com/walk-with-code/scratch-to-scale?promoCode=hugo39

📺 Watch the video version on YouTube: YouTube link

Comments

In Channel

Episode 64: Data Science Meets Agentic AI with Michael Kennedy (Talk Python)

2025-12-0301:02:56

Episode 63: Why Gemini 3 Will Change How You Build AI Agents with Ravin Kumar (Google DeepMind)

2025-11-2201:00:12

Episode 62: Practical AI at Work: How Execs and Developers Can Actually Use LLMs

2025-10-3159:04

Episode 61: The AI Agent Reliability Cliff: What Happens When Tools Fail in Production

2025-10-1628:04

Episode 60: 10 Things I Hate About AI Evals with Hamel Husain

2025-09-3001:13:15

Episode 59: Patterns and Anti-Patterns For Building with AI

2025-09-2347:37

Episode 58: Building GenAI Systems That Make Business Decisions with Thomas Wiecki (PyMC Labs)

2025-09-0901:00:45

Episode 57: AI Agents and LLM Judges at Scale: Processing Millions of Documents (Without Breaking the Bank)

2025-08-2941:27

Episode 56: DeepMind Just Dropped Gemma 270M... And Here’s Why It Matters

2025-08-1445:40

Episode 55: From Frittatas to Production LLMs: Breakfast at SciPy

2025-08-1238:08

Episode 54: Scaling AI: From Colab to Clusters — A Practitioner’s Guide to Distributed Training and Inference

2025-07-1841:17

Episode 53: Human-Seeded Evals & Self-Tuning Agents: Samuel Colvin on Shipping Reliable LLMs

2025-07-0844:49

Episode 52: Why Most LLM Products Break at Retrieval (And How to Fix Them)

2025-07-0228:38

Episode 51: Why We Built an MCP Server and What Broke First

2025-06-2647:41

Episode 50: A Field Guide to Rapidly Improving AI Products -- With Hamel Husain

2025-06-1727:42

Episode 49: Why Data and AI Still Break at Scale (and What to Do About It)

2025-06-0501:21:45

Episode 48: HOW TO BENCHMARK AGI WITH GREG KAMRADT

2025-05-2301:04:25

Episode 47: The Great Pacific Garbage Patch of Code Slop with Joe Reis

2025-04-0701:19:12

Episode 46: Software Composition Is the New Vibe Coding

2025-04-0301:08:57

Episode 45: Your AI application is broken. Here’s what to do about it.

2025-02-2001:17:30

00:00

1.0x

Episode 54: Scaling AI: From Colab to Clusters — A Practitioner’s Guide to Distributed Training and Inference

#box-pro-ellipsis-176507903863664{-webkit-line-clamp:2;}Episode 54: Scaling AI: From Colab to Clusters — A Practitioner’s Guide to Distributed Training and Inference

Episode 54: Scaling AI: From Colab to Clusters — A Practitioner’s Guide to Distributed Training and Inference

Hugo Bowne-Anderson

Episode 54: Scaling AI: From Colab to Clusters — A Practitioner’s Guide to Distributed Training and Inference