DiscoverEntra.ChatInside Entra Resilience: Microsoft's Outage War Stories, Backup Secrets and Preventing Global Outages
Inside Entra Resilience: Microsoft's Outage War Stories, Backup Secrets and Preventing Global Outages

Inside Entra Resilience: Microsoft's Outage War Stories, Backup Secrets and Preventing Global Outages

Update: 2025-08-23
Share

Description

In this episode, I sit down with my boss, Tarek Dawoud, to pull back the curtain on what really happens during a major service outage.

Tarek shares some incredible "war stories" from his time in the trenches, from the early days of DirSync where the team had to edit a sync file with a debugger to prevent an incident, to the massive outages of 2017 and 2018 that changed everything.

We'll give you a peek into the high-stakes, quick-thinking world of a "live site" incident and reveal the groundbreaking engineering principles like cell-based architecture and the backup authentication service that were born from these challenges, making Entra more resilient than ever before.

Subscribe with your favorite podcast player or watch on YouTube ๐Ÿ‘‡

About Tarek Dawoud

Tarek Dawoud is a Lead Architect in the Customer Engineering team for Microsoft Entra. With years of experience growing up in Entra engineering, he has been involved in his share of outages and has a deep understanding of what it takes to build and maintain a resilient, hyperscale identity service.

LinkedIn - https://www.linkedin.com/in/tarekdawoud/

๐Ÿ”— Related Links

* SLA performance for Microsoft Entra ID - aka.ms/entraidsla

* Microsoft Blames "Severe Weather" for Azure Cloud Outage

* Microsoft Probes Cause of Global Web Outage

* Microsoft's Azure AD authentication outage: What went wrong

๐Ÿ“— Chapters

00:57 What is a "Live Site"?

14:15 The Secret to Entra's Uptime: Cell-Based Architecture

18:09 How Entra Routes Your Login Request Globally

24:46 War Story #1: The 2017 Conditional Access Outage

29:52 War Story #2: How a Hurricane & an Office Bug Caused Chaos

43:39 The Backup Auth Service: Entra's Secret Weapon

57:54 Does the Backup Service Kick in Automatically?

01:04:16 Regional Isolation & The Power of Managed Identity

01:08:17 Anatomy of a Near-Outage in 2021

01:12:02 How Microsoft's Culture Learns From Mistakes

Podcast Apps

๐ŸŽ™๏ธ Entra.Chat - https://entra.chat

๐ŸŽง Apple Podcast โ†’ https://entra.chat/apple

๐Ÿ“บ YouTube โ†’ https://entra.chat/youtube

๐Ÿ“บ Spotify โ†’ https://entra.chat/spotify

๐ŸŽง Overcast โ†’ https://entra.chat/overcast

๐ŸŽง Pocketcast โ†’ https://entra.chat/pocketcast

๐ŸŽง Others โ†’ https://entra.chat/rss

Merill's socials

๐Ÿ“บ YouTube โ†’ youtube.com/@merillx

๐Ÿ‘” LinkedIn โ†’ linkedin.com/in/merill

๐Ÿค Twitter โ†’ twitter.com/merill

๐Ÿ•บ TikTok โ†’ tiktok.com/@merillf

๐Ÿฆ‹ Bluesky โ†’ bsky.app/profile/merill.net

๐Ÿ˜ Mastodon โ†’ infosec.exchange/@merill

๐Ÿงต Threads โ†’ threads.net/@merillf

๐Ÿค– GitHub โ†’ github.com/merill



Get full access to Entra.News - Your weekly dose of Microsoft Entra at entra.news/subscribe
Commentsย 
loading
In Channel
loading
00:00
00:00
1.0x

0.5x

0.8x

1.0x

1.25x

1.5x

2.0x

3.0x

Sleep Timer

Off

End of Episode

5 Minutes

10 Minutes

15 Minutes

30 Minutes

45 Minutes

60 Minutes

120 Minutes

Inside Entra Resilience: Microsoft's Outage War Stories, Backup Secrets and Preventing Global Outages

Inside Entra Resilience: Microsoft's Outage War Stories, Backup Secrets and Preventing Global Outages

Merill Fernando