DiscoverHumans of ReliabilityThe Reliability Diagnosis: Google’s Steve McGhee on Debugging and Incident Response
The Reliability Diagnosis: Google’s Steve McGhee on Debugging and Incident Response

The Reliability Diagnosis: Google’s Steve McGhee on Debugging and Incident Response

Update: 2025-02-10
Share

Description

In this episode of Humans of Reliability, we sit down with Steve McGhee, Reliability Advocate at Google, to discuss his journey from early SRE work to advocating for reliability best practices. 

Steve shares fascinating stories from his time at Google, the challenges of implementing SRE in enterprises, and what people often misunderstand about the discipline. 

He also offers valuable insights on incident response, distributed systems, and the underrated skill every reliability engineer should master. Whether you're new to SRE or a seasoned professional, this conversation is packed with wisdom and practical takeaways.

This episode is also available as a video interview on YouTube.

Comments 
In Channel
loading
00:00
00:00
x

0.5x

0.8x

1.0x

1.25x

1.5x

2.0x

3.0x

Sleep Timer

Off

End of Episode

5 Minutes

10 Minutes

15 Minutes

30 Minutes

45 Minutes

60 Minutes

120 Minutes

The Reliability Diagnosis: Google’s Steve McGhee on Debugging and Incident Response

The Reliability Diagnosis: Google’s Steve McGhee on Debugging and Incident Response

Rootly