DiscoverSlight Reliability
Slight Reliability
Claim Ownership

Slight Reliability

Author: Stephen Townshend

Subscribed: 16Played: 550
Share

Description

Learning SRE, one day at a time.
120ย Episodes
Reverse
Send a text This week I sit down and have a discussion with Amin Astaneh (from Certo Modo) about CI/CD. We cover the power of the standard change as a way to navigate ITIL while still implementing DevOps practices, what to monitor to make your CI/CD observable, single piece flow, testing in production, and so much more. You can find Amin on his company website https://certomodo.io, LinkedIn: https://www.linkedin.com/in/aminastaneh/ and Twitter: https://twitter.com/aastaneh You can find the ...
Send a text "Environment issues are just incidents that happened to occur in a non-production environment"... so why do we treat them so differently? In this first episode of the 2024 season I reflect on how we handle incidents in non-prod environments. (Note: Had a few issues with noise suppression in OBS Studio cutting off the start of some words, will sort it for the next episode) You can find Stephen at: LinkedIn: https://www.linkedin.com/in/stephentownshend/ Twitter: https://twitter....
Send a text How do you ingest and store petabytes of telemetry every day in a cost effective and high performing way? How can you do this in a way which gives engineers the operational data they need to keep services running? How has this challenge be tackled in the past and what's been the evolution? This week I'm joined by Observe co-founder Jacob Leverich to go deep into this topic. We discuss... ๐Ÿ’พ A deep-dive into the evolution of telemetry storage and where it's going ๐Ÿ’ฝ The advent of gen...
Send a text How do you take all the utopian ideas you read about in books and apply them to the reality of the organisations we work in? This week I'm joined by leader, mentor, and coach Rob Roe to tackle this question. We discuss... ๐ŸŒช๏ธ The pitfalls of functional silos ๐Ÿคซ Is the annual budget a load of rubbish? ๐Ÿƒ How our management promotion systems are often broken ๐Ÿซ‚ The power of virtual teams ๐Ÿ“— Team interaction models ...and much more. You can find Rob on... LinkedIn: https://www.linkedin.co...
Send a text We spend a third of our life at work. It needs to be something we enjoy and something with purpose. Our work experience also impacts our family, friends, and our personal lives. This week I'm joined by tech engineer, leader, and author Richard Bown to explore this and many other topics including... ๐ŸŒช๏ธ The difficulty in applying the ideas we read in books in real organisations ๐Ÿคซ When you want to implement a thing you can't talk directly about the thing ๐Ÿƒ Does change require senior ...
Send a text When you become a people leader there is no manual. How can we not only learn leadership skills but practice them and build leadership muscle? This week I'm joined by Orion Group Limited co-founder Xiao Zhang to discuss... ๐Ÿ‘‘ The challenge of transitioning into people leadership ๐Ÿ’ช How we don't get fit by watching other people work out โŒš Pausing as an act of active leadership ๐ŸŒŒ The power of slack time for creativity and systems thinking ๐ŸŒŠ Going below the waterline ...and much more. ...
Send a text This week I kick off the 2026 season with some news and we explore how to prepare for a new role. You can buy Slight Reliability merch here (Note: you cannot order the mugs outside of New Zealand): https://slightreliability.digitees.co.nz/ You can find Stephen on: LinkedIn: https://www.linkedin.com/in/stephentownshend/ Bluesky: https://bsky.app/profile/slightreliability.bsky.social YouTube: https://www.youtube.com/c/SlightReliability Instagram: https://www.instagram.com/slight_rel...
Send a text From the day we invented computers we've been struggling to keep applications running and delivering services to the business. Is this latest wave of AI helping or hurting us? This week I'm joined by Causely founder Shmuel Kliger to dive into... ๐ŸŒŠ The three waves of AI hype over the decades (the history of AI) โ˜ ๏ธ The dangers of over-promising and under-delivering what AI can do ๐Ÿง  What is causal reasoning? ๐Ÿ˜ฑ Is AI replacing SREs? ๐Ÿ”ฎ AI as a way to allow humans to solve higher level ...
Send a text What is operational intelligence and how is it different from observability or BI? This week I'm joined by SquaredUp's VP of Innovation Adam Kinniburgh to answer that question and many more including... โ“ What is operational intelligence? ๐Ÿ™ˆ Relating observability back to customer, business, or revenue ๐Ÿ˜Ž The value of giving stakeholders confidence ๐ŸŒ‰ Who bridges the gap between tech and business or engineers and leadership? ๐Ÿฆ‹ Correlation VS causation and our innate desire to build c...
Send a text How does leading platform teams differ from leading product teams? This week I'm joined by experienced technology leader Dinesh Sukhija to answer that question and many more including... โ“ What is a platform team? โšฝ Coaching engineers to focus on outcomes โ˜€๏ธ Connecting platform initiatives to business goals โœ‹ Identifying the limiters in your team ๐ŸŽค Spreading knowledge and avoiding single points of failure ...and much more. You can find Dinesh on: LinkedIn: https://www.linkedin.com...
Send a text How has my first two years as a manager in tech been? What have I learned? What do I need to work on? This week I share my experiences over the past couple of years. I cover: ๐Ÿ”ฅ My recent close call with burnout ๐Ÿซถ How I attempted to build a team culture ๐Ÿ’ช The importance of tough conversations ๐Ÿฅฑ How roles and responsibilities might be boring to think about but is critical โ“ What's next? ...and much more. You can find me on: LinkedIn: https://www.linkedin.com/in/stephentownshend/ Blu...
Send a text How could AI help human beings negotiate the mountains of telemetry we collect to get simple and fast insight? This week I'm joined by Ottermon AI CEO and founder Checo Pacheco about the lifecycle of observability coverage and tooling within organisations and how AI is helping to find signals amongst the noise and reduce cognitive load for SREs. We discuss... ๐ŸŽ‚ The need for a layer of logic on top of our telemetry data ๐Ÿšฒ The observability lifecycle of a DevOps team ๐ŸŽถ How most orgs...
Send a text What is chaos engineering and how is it being used in 2025? This week I'm joined by Gremlin CEO and founder Kolton Andrus to discuss... ๐ŸŒช๏ธ What is chaos engineering and what is its origins? ๐Ÿชด How has it evolved over the year? ๐Ÿค– The role of AI agents in SRE work ๐Ÿ’ฐ Justifying the value of chaos engineering ๐Ÿƒโ€โ™€๏ธโ€โžก๏ธ How do I get started? ...and much more. You can find Kolton on: LinkedIn: https://www.linkedin.com/in/kolton-andrus-77315a2/ And you can find out more about Gremlin's new ...
Send a text What are Team Topologies? How can they be used to deliver value simpler and more effectively (and in a more humane way)? This week I'm joined by Luke McManus to discuss... โ›ฐ๏ธ What are the four team topologies? ๐Ÿ† Can we have too much collaboration? โŒš Team interaction models ๐ŸŒ Cognitive load ๐Ÿƒโ€โ™€๏ธโ€โžก๏ธ Value dynamics mapping ...and much more. You can find Luke on: LinkedIn: https://www.linkedin.com/in/luke-mcmanus-agile/ Check out the recently released second edition of the Team Topolo...
Send a text How do you begin contributing to an open source project? What's it like? What do you get out of it? This week I'm joined by Wendy Ha who shares her unique story of joining the Kubernetes project and becoming a contributor. We explore... โ›ฐ๏ธ What it's like working on one of the biggest open source projects in the world ๐Ÿ† The benefits of contributing to open source โŒš How much time and effort does it take? ๐ŸŒ The unique challenges of contributing from APAC (and the need for more contri...
Send a text As an #SRE how do you influence senior leadership to get support and priority for the things you care about? To answer this question I'm joined by Nora Jones, founder of Jeli and now Head of Pricing, Product Strategy and Growth at PagerDuty. Our conversation touches on... ๐Ÿค How understanding needs to flow both ways (between engineers and leaders) ๐ŸŽจ Reliability is as much an art as a science ๐Ÿ“ Using napkin math to start conversations ๐Ÿง  Understand the system (your org) before trying...
Send a text This week I do a retrospective on the Slight Reliability podcast. ๐Ÿ‘‚ How many people listen to it? โค๏ธ How do I feel about the show? ๐ŸŽ‰ What's going well? ๐Ÿชด What could be better? โ” What's next for the show? If you want to check out the podcast that came before Slight Reliability, you can find Performance Time archived on YouTube here: https://www.youtube.com/@performance-time You can find Stephen on: LinkedIn: https://www.linkedin.com/in/stephentownshend/ Bluesky: https://bsky.app/pr...
Send a text Have you burned out at work? What was your experience? How did you work through it? This week I'm joined by the incredible Colette Alexander to discuss what burnout is, what it means, and we both share our personal experiences burning out at work. We cover... ๐Ÿ”ฅ What is burnout? โ“ Why does it happen? ๐Ÿซ€ What are the symptoms? ๐ŸฅŠ Fight, flight, or freeze ๐Ÿง‘โ€๐Ÿš’ Advice on how to recover ...and much more. Resources from the show... Why you're so angry at work (and what to do about it) by N...
Send a text This week I'm joined by the wonderful Hanson Ho to discuss the unique challenges and opportunities in making our mobile apps observable! We cover... ๐Ÿ“ฑ The mobile/backend observability divide โœ๏ธ The challenge of distributed tracing on mobile apps ๐ŸŒ The entire device runtime environment matters for your app ๐Ÿ‘ค The quest for user-centric mobile observability โœ… Advice on how to get started with mobile observability ...and much more. You can find Hanson on: LinkedIn: https://www.linkedi...
Send a text This week on the I'm joined once more by SRE leader Michelle Casey who gives a broad and shallow introduction to resilience engineering. We cover... ๐Ÿ‹๏ธโ€โ™€๏ธ Reliability VS Robustness VS Resilience ๐Ÿงฉ What is a complex system? ๐Ÿ”ข Safety one/safety two ๐Ÿง  Mental models ๐Ÿ˜ฉ Human error ...and so much more. Resources from this episode: Four concepts for resilience (paper) by Dr. David Woods https://www.researchgate.net/publication/276139783_Four_concepts_for_resilience_and_the_implications_f...
loading
Commentsย