DiscoverMaking of the SRE OmeletteEpisode 1 - Resilience Enablement
Episode 1 - Resilience Enablement

Episode 1 - Resilience Enablement

Update: 2025-11-10
Share

Description

Season 3 of Making of the SRE Omelette is here - and it’s all about resilience.

Resilience isn’t just about surviving outages. It’s about building systems and cultures that adapt, learn, and thrive under pressure.


In our kickoff episode, we sit down with Dr. Jennifer Petoff, co-editor of Site Reliability Engineering: How Google Runs Production Systems and leader of Google’s Global SRE Education. Jennifer shares why resilience starts with people, not just technology—and how psychological safety and confidence are the secret ingredients for reliability at scale.


You’ll learn:

* How to scale learning like a production system


* Why postmortem culture drives improvement


* How to apply SRE principles beyond infrastructure


If you’ve ever wondered how to make reliability a business advantage, this episode is for you.


 


Check out How to SRE Anything here: https://www.reliablepgm.com/how-to-sre-anything/


 


Topics:

* Origins of SRE and Education at Google

How Google scaled SRE education globally.

Why education is treated like a production system (repeatable, reliable, measurable).


* Psychological Safety and Learning

Why psychological safety is critical for resilience.

Creating environments where teams can share mistakes without fear of blame.

How this accelerates learning and reliability.


* Hands-On Experience as a Learning Model

Importance of experiential learning (e.g., game days, simulations).

Why theory alone isn’t enough for building confidence under pressure.


* Scaling Knowledge Across Large Organizations

Strategies Google uses to scale SRE principles globally.

Balancing standardization with flexibility for local teams.


* Resilience Beyond Reliability

How resilience differs from reliability.

Building adaptive systems and teams that thrive through adversity.


* Culture as a Foundation

Why culture is the “secret ingredient” for successful SRE adoption.

Encouraging curiosity and collaboration across roles.


* Future of SRE Education

Trends in learning for distributed teams.

How continuous education supports evolving reliability practices.

Comments 
00:00
00:00
x

0.5x

0.8x

1.0x

1.25x

1.5x

2.0x

3.0x

Sleep Timer

Off

End of Episode

5 Minutes

10 Minutes

15 Minutes

30 Minutes

45 Minutes

60 Minutes

120 Minutes

Episode 1 - Resilience Enablement

Episode 1 - Resilience Enablement

Kevin Yu - Principal SRE, IBM Software