Discoveralphalist.CTO Podcast - For CTOs and Technical Leaders#113 - Faster Incident Response feat. Tim Armandpour // CTO @ PagerDuty
#113 - Faster Incident Response feat. Tim Armandpour // CTO @ PagerDuty

#113 - Faster Incident Response feat. Tim Armandpour // CTO @ PagerDuty

Update: 2024-12-05
Share

Description

Planning AND Practice: The Secret to Incident Response

Plan and PRACTICE for better incident response with insights from Tim Armandpour, CTO of PagerDuty. Learn the secrets to resilience from the team that mitigated the impact of a major outage—handling a 250% traffic surge while delivering on their SLA.


Listen to find out:



  • 🛠️ Why planning AND practice are both critical for incident response.

  • 🚧 How to practice for incident response (e.g Failure Fridays with Chaos Engineering)

  • 🧑‍🤝‍🧑 Ownership: Why tech AND business teams must join post-mortems.

  • ☁️ How to mitigate the impact of your cloud provider’s lower SLA.

  • ⚓ Which architectural patterns are more resilient?

  • ⚖️ WARNING: “bend” the CAP theorem at your own risk


Listen here


TimeStamps:
(00:00:00 ) Introduction to Alphalist Podcast
(00:01:00 ) Meet Tim Armanpour
(00:01:56 ) Tim's Early Career
(00:06:22 ) Handling Major Incidents at PagerDuty
(00:09:21 ) The Importance of Preparedness
(00:13:54 ) Practicing Failure Scenarios
(00:18:16 ) Resilient Infrastructure and Architectural Patterns
(00:22:44 ) Standardization and Data Management
(00:25:48 ) Exploring Infrastructure Resilience
(00:26:20 ) Achieving High Availability with Lower SLA Cloud Platforms
(00:29:38 ) Defining Meaningful SLIs
(00:32:15 ) Assessing Incident Readiness
(00:35:15 ) The Importance of Ownership
(00:41:30 ) Continuous Improvement
(00:43:53 ) Lessons from a Yogurt Business
(00:48:18 ) Final Thoughts and Takeaways

Comments 
In Channel
loading
00:00
00:00
x

0.5x

0.8x

1.0x

1.25x

1.5x

2.0x

3.0x

Sleep Timer

Off

End of Episode

5 Minutes

10 Minutes

15 Minutes

30 Minutes

45 Minutes

60 Minutes

120 Minutes

#113 - Faster Incident Response feat. Tim Armandpour // CTO @ PagerDuty

#113 - Faster Incident Response feat. Tim Armandpour // CTO @ PagerDuty

Tobias Schlottke - alphalist CTO Podcast