DiscoverEntra.ChatInside Entra Resilience: Microsoft's Outage War Stories, Backup Secrets and Preventing Global Outages
Inside Entra Resilience: Microsoft's Outage War Stories, Backup Secrets and Preventing Global Outages

Inside Entra Resilience: Microsoft's Outage War Stories, Backup Secrets and Preventing Global Outages

Update: 2025-08-23
Share

Description

In this episode, I sit down with my boss, Tarek Dawoud, to pull back the curtain on what really happens during a major service outage.

Tarek shares some incredible "war stories" from his time in the trenches, from the early days of DirSync where the team had to edit a sync file with a debugger to prevent an incident, to the massive outages of 2017 and 2018 that changed everything.

We'll give you a peek into the high-stakes, quick-thinking world of a "live site" incident and reveal the groundbreaking engineering principles like cell-based architecture and the backup authentication service that were born from these challenges, making Entra more resilient than ever before.

Subscribe with your favorite podcast player or watch on YouTube πŸ‘‡

About Tarek Dawoud

Tarek Dawoud is a Lead Architect in the Customer Engineering team for Microsoft Entra. With years of experience growing up in Entra engineering, he has been involved in his share of outages and has a deep understanding of what it takes to build and maintain a resilient, hyperscale identity service.

LinkedIn - https://www.linkedin.com/in/tarekdawoud/

πŸ”— Related Links

* SLA performance for Microsoft Entra ID - aka.ms/entraidsla

* Microsoft Blames "Severe Weather" for Azure Cloud Outage

* Microsoft Probes Cause of Global Web Outage

* Microsoft's Azure AD authentication outage: What went wrong

πŸ“— Chapters

00:57 What is a "Live Site"?

14:15 The Secret to Entra's Uptime: Cell-Based Architecture

18:09 How Entra Routes Your Login Request Globally

24:46 War Story #1: The 2017 Conditional Access Outage

29:52 War Story #2: How a Hurricane & an Office Bug Caused Chaos

43:39 The Backup Auth Service: Entra's Secret Weapon

57:54 Does the Backup Service Kick in Automatically?

01:04:16 Regional Isolation & The Power of Managed Identity

01:08:17 Anatomy of a Near-Outage in 2021

01:12:02 How Microsoft's Culture Learns From Mistakes

Podcast Apps

πŸŽ™οΈ Entra.Chat - https://entra.chat

🎧 Apple Podcast β†’ https://entra.chat/apple

πŸ“Ί YouTube β†’ https://entra.chat/youtube

πŸ“Ί Spotify β†’ https://entra.chat/spotify

🎧 Overcast β†’ https://entra.chat/overcast

🎧 Pocketcast β†’ https://entra.chat/pocketcast

🎧 Others β†’ https://entra.chat/rss

Merill's socials

πŸ“Ί YouTube β†’ youtube.com/@merillx

πŸ‘” LinkedIn β†’ linkedin.com/in/merill

🐀 Twitter β†’ twitter.com/merill

πŸ•Ί TikTok β†’ tiktok.com/@merillf

πŸ¦‹ Bluesky β†’ bsky.app/profile/merill.net

🐘 Mastodon β†’ infosec.exchange/@merill

🧡 Threads β†’ threads.net/@merillf

πŸ€– GitHub β†’ github.com/merill



Get full access to Entra.News - Your weekly dose of Microsoft Entra at entra.news/subscribe
CommentsΒ 
In Channel
loading
00:00
00:00
1.0x

0.5x

0.8x

1.0x

1.25x

1.5x

2.0x

3.0x

Sleep Timer

Off

End of Episode

5 Minutes

10 Minutes

15 Minutes

30 Minutes

45 Minutes

60 Minutes

120 Minutes

Inside Entra Resilience: Microsoft's Outage War Stories, Backup Secrets and Preventing Global Outages

Inside Entra Resilience: Microsoft's Outage War Stories, Backup Secrets and Preventing Global Outages

Merill Fernando