Running LLMs on Kubernetes: From GPU Bottlenecks to Reliable, Production-Grade Inference

Update: 2025-09-18

Description

In this episode, Abdel SGHIOUAR, Senior Cloud Developer Advocate at Google and CNCF Ambassador, unpacks the technical and cultural shifts happening in the Kubernetes ecosystem. As co-host of the Kubernetes Podcast, he shares how enterprises, from LinkedIn to CERN, are pushing Kubernetes beyond microservices into powering GenAI workloads. Abdel dives into the hardware bottlenecks, operational hurdles, and community-driven innovations shaping the future of Kubernetes.

Comments

In Channel

Inside Modern Infra: Kerim Satirli on Terraform, Vault Radar & AI-Safe Workflows

2025-12-1038:54

Shaping the Future of Documentation: Taylor Dolezal on OSS, Knowledge & Dosu.dev

2025-12-1054:54

Redefining Enterprise Storage with Infinidat

2025-11-2550:11

Reimagining Observability How Splunk is Powering the Future with OpenTelemetry

2025-11-2550:19

Redefining Reliability How Komodor is Building the Future of AI-Driven SRE

2025-11-2530:15

From Vector Search to Database Agents - Weaviate’s Journey to Powering Production AI

2025-11-2542:24

Operationalizing AI How Jozu Brings Machine Learning to Production

2025-11-2550:39

Beyond YAML The Future of Platform Engineering with Formae

2025-11-2551:14

Scaling Smart Sarah Polan on AI, Compliance & Modern Infrastructure in Financial Services

2025-11-2539:06

Cut the Noise Runtime Vulnerability Prioritization with Upwind

2025-11-2540:39

From Zanzibar to SpiceDB Rethinking Access Control at AuthZed

2025-11-2542:15

Policies, Not Platforms Practical Lessons for GitOps & DevEx Success

2025-11-2544:37

Running LLMs on Kubernetes: From GPU Bottlenecks to Reliable, Production-Grade Inference

2025-09-1845:46

How Akamas IO Uses AI to Unlock 50–90% Cloud Cost Savings with Continuous Optimization

2025-09-1852:15

From Kubernetes Costs to AI Bot Wars – Why Serverless Needs WebAssembly

2025-09-1856:12

From Crashes to Complete Mobile Observability – Instabug on App Quality Beyond Bug Reporting

2025-09-1854:14

Securing AI Agents with Just-in-Time Permissions: Permit.io on Agent Identity & Consent

2025-09-1801:00:48

From Chaos to Control: How CloudQuery Makes Cloud Data AI-Ready

2025-08-1849:15

Scaling Kubernetes the Right Way: Platform Engineering Lessons from Kubermatic

2025-08-1847:54

Beyond Hype Akka.io's CEO on Solving Agentic AI's Toughest Challenges – Cost & Developer Experience

2025-08-1848:29

00:00

Running LLMs on Kubernetes: From GPU Bottlenecks to Reliable, Production-Grade Inference

#box-pro-ellipsis-176683726129526{-webkit-line-clamp:2;}Running LLMs on Kubernetes: From GPU Bottlenecks to Reliable, Production-Grade Inference

Running LLMs on Kubernetes: From GPU Bottlenecks to Reliable, Production-Grade Inference

SoftwarePlaza

Running LLMs on Kubernetes: From GPU Bottlenecks to Reliable, Production-Grade Inference