DiscoverCode ImpactReliability Engineering: History, Practice, and Future
Reliability Engineering: History, Practice, and Future

Reliability Engineering: History, Practice, and Future

Update: 2025-01-19
Share

Description

This podcast explores the field of reliability engineering, tracing its origins at Google with the development of Site Reliability Engineering (SRE). It differentiates reliability engineering from SRE, highlighting its broader applicability across various organisational structures. The podcast outlines four key promises of a successful reliability team: defining service levels (SLA/SLO/SLI), managing the service infrastructure, participating in technical design, and providing tactical support during incidents. Finally, it discusses the evolving landscape of reliability engineering, emphasising pragmatic approaches to balancing cost and reliability needs, and advocating for a more nuanced understanding of when to build versus buy solutions.

Comments 
00:00
00:00
x

0.5x

0.8x

1.0x

1.25x

1.5x

2.0x

3.0x

Sleep Timer

Off

End of Episode

5 Minutes

10 Minutes

15 Minutes

30 Minutes

45 Minutes

60 Minutes

120 Minutes

Reliability Engineering: History, Practice, and Future

Reliability Engineering: History, Practice, and Future

Sanket Makhija