DiscoverGoogle SRE Prodcast
Google SRE Prodcast
Claim Ownership

Google SRE Prodcast

Author: Salim Virji

Subscribed: 102Played: 1,121
Share

Description

SRE Prodcast brings Google's experience with Site Reliability Engineering together with special guests and exciting topics to discuss the present and future of reliable production engineering!
21 Episodes
Reverse
Ben Treynor Sloss (VP of Engineering, Google) joins hosts Steve McGhee and Dr. Jennifer Petoff (Director of Technical Infrastructure Education, Google) to share the evolution of SRE and its impact on software development, how AI and ML significantly impacts SRE practices, and the future of SRE. Ben coined the term "Site Reliability Engineering" for his team of (now) 4,000 software engineers, engaged in what were traditionally operations functions. Under Ben's leadership, Google SRE wrote two best-selling books on SRE. Since then, the rest of the SaaS industry has come to adopt the SRE name, mission, and practices. 
In this episode, Healfdene Goguen (Principal Engineer, Google) joins hosts Steve McGhee and Jordan Greenberg to discuss the vast amount of work to be done by SREs, and the fascinating challenges to tackle with clear real-world implications. It's a truly exciting time to be an SRE at Google!
In this season of Google Prodcast, current and former SREs, both within and outside of Google, chat with hosts Steve McGhee and Jordan Greenberg to discuss software systems designed and built by SREs.  For "episode zero", guests Amy Tobey (Live Services SRE, Netflix) and Dr. Vladyslav Ukis (Head of R&D, Siemens Healthineers, Author of "Establishing SRE Foundations") will set the stage for the season with a lively discussion about what Software Engineering means to Site Reliability Engineering.
Former Google SREs, or "Xooglers", talk with hosts MP and Steve McGhee about site reliability engineering outside of Google. What’s the difference in scale? What skills are generally valuable? And why can’t you build “SRE in a box” that jump-starts pretty much any organization? Join Carla Geisser, Cody Smith, and Laura Nolan in their lively conversation about what SRE skills and knowledge they have found useful in roles outside of Google. 
Sabrina Farmer, VP of Engineering at Google, talks about her career journey through Site Reliability Engineering.  What does management mean? What’s involved in being an effective manager? and what’s a feasibility study? Hear some great advice on how to get what you expect out of a role, wherever on the ladder it is. 
Dave Reisner talks about his path to Staff SRE, from ArchLinux contributor through DevOps to software engineer. This episode emphasizes the value of strong mentoring and manager relationships, and the challenges of work-life balance.  
Explore the role and responsibilities of an SRE manager with Stephen Benjamin.
Explore the career path of SREs Shannon Brady and Theo Klein as they discusses their paths to Site Reliability Engineering and finding their areas of expertise. 
In this episode, Mariuxi and Julian discuss their paths to SRE: what drew them initially to SRE, and what motivates them to continue developing skills  
How does one become an SRE? And what’s the career like? In this episode, Tom and Megan discuss their path to SRE.  
Host MP English and former Google SRE John Reese (JTR) chat about the creation of the Prodcast. Visit https://sre.google/prodcast for transcripts and links to further reading. View transcript
Ayelet Sachto offers advice on creating an actionable, transparent, and blameless postmortem culture. Visit https://sre.google/prodcast for transcripts and links to further reading. View transcript
Adrienne Walcer discusses how to approach and organize incident management efforts throughout the production lifecycle. Visit https://sre.google/prodcast for transcripts and links to further reading. View transcript
Andrew Widdowson (APW) shares strategies for successful on-call rotations. Visit https://sre.google/prodcast for transcripts and links to further reading. View transcript
Pierre Palatin dives into different automation strategies, how to build confidence in your system, and why designing the UI may be your biggest challenge. Visit https://sre.google/prodcast for transcripts and links to further reading. View transcript
Pavan Adharapurapu details how to approach large-scale migrations while optimizing for user experience. Visit https://sre.google/prodcast for transcripts and links to further reading. View transcript
Narayan Desai explains why SLOs can be problematic and proposes alternative methods for monitoring complex, large-scale systems. Visit https://sre.google/prodcast for transcripts and links to further reading. View transcript
Amelia Harrison advises on when and how to alert, ideal coverage, and tuning. Visit https://sre.google/prodcast for transcripts and links to further reading. View transcript
Silvia Esparrachiari talks about the challenges of monitoring and the importance of understanding your users. Visit https://sre.google/prodcast for transcripts and links to further reading. View transcript
What is SRE, anyway? Jennifer Mace (Macey) gives us her definition of "site reliability engineer," discusses how to manage risk, and shares key questions to ask developers. Visit https://sre.google/prodcast for transcripts and links to further reading. View transcript
loading