DiscoverThe SoftwarePlaza IT PodcastRunning LLMs on Kubernetes: From GPU Bottlenecks to Reliable, Production-Grade Inference
Running LLMs on Kubernetes: From GPU Bottlenecks to Reliable, Production-Grade Inference

Running LLMs on Kubernetes: From GPU Bottlenecks to Reliable, Production-Grade Inference

Update: 2025-09-18
Share

Description

In this episode, Abdel SGHIOUAR, Senior Cloud Developer Advocate at Google and CNCF Ambassador, unpacks the technical and cultural shifts happening in the Kubernetes ecosystem. As co-host of the Kubernetes Podcast, he shares how enterprises, from LinkedIn to CERN, are pushing Kubernetes beyond microservices into powering GenAI workloads. Abdel dives into the hardware bottlenecks, operational hurdles, and community-driven innovations shaping the future of Kubernetes.

Comments 
In Channel
loading
00:00
00:00
x

0.5x

0.8x

1.0x

1.25x

1.5x

2.0x

3.0x

Sleep Timer

Off

End of Episode

5 Minutes

10 Minutes

15 Minutes

30 Minutes

45 Minutes

60 Minutes

120 Minutes

Running LLMs on Kubernetes: From GPU Bottlenecks to Reliable, Production-Grade Inference

Running LLMs on Kubernetes: From GPU Bottlenecks to Reliable, Production-Grade Inference

SoftwarePlaza