DiscoverKubeFMFrom Fragile to Faultless: Kubernetes Self-Healing In Practice, with Grzegorz Głąb
From Fragile to Faultless: Kubernetes Self-Healing In Practice, with Grzegorz Głąb

From Fragile to Faultless: Kubernetes Self-Healing In Practice, with Grzegorz Głąb

Update: 2025-04-29
Share

Description

Discover how to build resilient Kubernetes environments at scale with practical automation strategies from an engineer who's tackled complex production challenges.

Grzegorz Głąb, Kubernetes Engineer at Cloud Kitchens, shares his team's journey developing a comprehensive self-healing framework. He explains how they addressed issues ranging from spot node preemptions to network packet drops caused by unbalanced IRQs, providing concrete examples of automation that prevents downtime and improves reliability.

You will learn:

  • How managed Kubernetes services like AKS provide benefits but require customization for specific use cases

  • The architecture of an effective self-healing framework using DaemonSets and deployments with Kubernetes-native components

  • Practical solutions for common challenges like StatefulSet pods stuck on unreachable nodes and cleaning up orphaned pods

  • Techniques for workload-level automation, including throttling CPU-hungry pods and automating diagnostic data collection

Sponsor

This episode is sponsored by LearnKube — get started on your Kubernetes journey through comprehensive online, in-person or remote training.

More info

Comments 
In Channel
loading
00:00
00:00
1.0x

0.5x

0.8x

1.0x

1.25x

1.5x

2.0x

3.0x

Sleep Timer

Off

End of Episode

5 Minutes

10 Minutes

15 Minutes

30 Minutes

45 Minutes

60 Minutes

120 Minutes

From Fragile to Faultless: Kubernetes Self-Healing In Practice, with Grzegorz Głąb

From Fragile to Faultless: Kubernetes Self-Healing In Practice, with Grzegorz Głąb