Discover
The Internet Report

The Internet Report
Author: ThousandEyes
Subscribed: 15Played: 406Subscribe
Share
© 2023 ThousandEyes
Description
This is the Internet Report, where we uncover what’s working and what’s breaking on the Internet—and why. Catch our Outage Deep Dive series for special coverage of Internet outages. We go under the hood to determine what happened, covering key lessons and ways IT teams can minimize downtime in similar situations.
Also tune in every other week for our Pulse Update podcast series to hear from the internet experts at ThousandEyes as they share the latest data on ISP outages, public cloud provider network outages, collaboration app network outages, and more. Then, the hosts dig into the most interesting outage events from the last few weeks.
Also tune in every other week for our Pulse Update podcast series to hear from the internet experts at ThousandEyes as they share the latest data on ISP outages, public cloud provider network outages, collaboration app network outages, and more. Then, the hosts dig into the most interesting outage events from the last few weeks.
52 Episodes
Reverse
On April 4, 2023, Virgin Media UK (AS 5089) experienced two outages that impacted the reachability of its network and services to the global Internet. The two outages shared similar characteristics, including the withdrawal of routes to its network, traffic loss, and intermittent periods of service recovery. In this episode, we discuss how the outages unfolded and what IT teams can learn from this to help navigate similar incidents in the future. To learn more, check out the links below: - Blog: Virgin Media UK Outage Analysis: https://www.thousandeyes.com/blog/virgin-media-uk-outage-analysis-april-4-2023- Explore the Virgin Media UK outage in the ThousandEyes platform (NO LOGIN REQUIRED): https://aoqqallcjrsdrxwjpdiizyxvnimmjgde.share.thousandeyes.com/———CHAPTERS00:00 Intro1:08 Overview: Virgin Media UK Outage3:29 Under the Hood: Virgin Media UK Outage———Want to get in touch?If you have questions, feedback, or guests you would like to see featured on the show, send us a note at InternetReport@thousandeyes.com. And follow us on Twitter: @thousandeyes
Live from #CiscoLiveEMEA, we discuss the Feb. 7 Microsoft Outlook outage to understand how the event unfolded, why it may have played out the way it did, and what you can learn from this outage event.To dive deeper, check out the links below:
Explore the outage in the ThousandEyes platform (NO LOGIN REQUIRED)
Microsoft Outlook Outage Analysis Blog (Feb. 7)
Microsoft Outage Analysis Blog (Jan. 25)
Want to get in touch?If you have questions, feedback, or guests you'd like to see featured on the show, send us a note at InternetReport@thousandeyes.com.And follow us on Twitter: @thousandeyes.Get your free T-shirt:Every new subscriber gets a ThousandEyes T-shirt. Just send your address and T-shirt size to us at InternetReport@thousandeyes.com.
At around 7:05 a.m. UTC on January 25, 2023, Microsoft started experiencing service related issues. At the same time, ThousandEyes observed BGP withdrawals and a significant number of route changes that resulted in a high amount of packet loss, ultimately affecting various services like Outlook, Teams, SharePoint, and others. 00:00 Welcome: This is The Internet Report, where we uncover what’s working and what’s breaking on the Internet—and why. Join our co-hosts Angelique Medina, Head of Internet Intelligence, and Kemal Santja, Principal Internet Architect, both from ThousandEyes, as they discuss the January 25, 2023 Microsoft outage. 00:43 Incident overview as seen in the ThousandEyes platform. Follow along in the ThousandEyes platform *no login required*: https://acimfsmgobnxgdkltxicdesijrowimst.share.thousandeyes.com10:18 Why would rapid announcement changes cause packet loss at the scale that was observed during this event?12:35 What should someone make of the length / duration of the event? 14:28 Why would the change of an IP address on a router cause such a major connectivity disruption? 23:37 What are some of the lessons you can learn from this event? Questions? Feedback? Send us an email at internetreport@thousandeyes.com
Starting at ~12:12 UTC on Dec 12, 2022, an ISP in the Democratic Republic of Congo leaked a route belonging to the Quad9 DNS service, causing some traffic, including Verizon US customer traffic, to get routed to Africa for ~90 minutes. High traffic loss was observed throughout the incident which was resolved at ~13:40 UTC. 00:00 Welcome: This is The Internet Report, where we uncover what’s working and what’s breaking on the Internet—and why. Join our co-hosts Mike Hicks, Principal Solutions Analyst, and Kemal Sanjta, Principal Internet Architect, both from ThousandEyes, as they discuss the December 12th Quad9 BGP route leak. 00:56 Under the Hood: Reviewing the Quad9 BGP route leak as seen in the ThousandEyes platform. Explore the incident yourself in the ThousandEyes platform at: https://aioiqfxeunngihtwnkphnuzazgloaiju.share.thousandeyes.com/ Questions? Feedback? Send us an email at internetreport@thousandeyes.com
This is The Internet Report, where we uncover what’s working and what’s breaking on the Internet—and why. In this episode, we unpack four notable outages that impacted WhatsApp, Zscaler, Salesforce, and Facebook, which all appear to have a common theme. Join our co-hosts Mike Hicks, Principal Solutions Analyst at ThousandEyes, and Chris Villemez, Technical Marketing Engineer at ThousandEyes, as they walk through each incident to understand what happened and discuss how network professionals can attempt to mitigate these types of scenarios in the future.
FURTHER READING
Facebook Outage Analysis → https://www.thousandeyes.com/blog/facebook-outage-analysis
We're back!
00:00 Welcome: This is The Internet Report, where we uncover what’s working and what’s breaking on the Internet—and why. On this episode, our newest host, Chris Villemez, is joined by Kemal Sanjta to discuss a BGP-related incident that took down Twitter for many users around the globe on March 28th.
00:36 Under the Hood: Chris Villemez and Kemal Sanjta leverage their extensive operations experience managing the networks of large-scale SaaS, IoT, and cloud providers to analyze this incident using the ThousandEyes platform. They examine the scope of the outage, review the specific BGP changes that resulted in the outage, and discuss what enterprises can do when they’re experiencing a similar BGP hijack or route leak.
Sharelinks:
Single agent (Manchester) test: https://anislusvvn.share.thousandeyes.com/
Multi-agent global test showing BGP changes: https://axntbxntyk.share.thousandeyes.com/
31:00 Outro: We've been on a bit of a break for the past several months as things were relatively quiet on the Internet front and for the foreseeable future we'll be a bit reactive in our episodes, when something major happens trust we'll be here. Questions? Feedback? Have an idea for a guest? Send us an email at internetreport@thousandeyes.com
This is The Internet Report, where we uncover what’s working and what’s breaking on the Internet—and why. On today’s episode, our newest host and Technical Marketing Engineer, Chris Villemez, is joined by Kemal Sanjta, Principal Engineer, to dive into the details of the recent AWS outages from December 7th, 10th and 15th. They’ll walk through what ThousandEyes saw from its fleet of vantage points, as well as share some insight into what enterprises can learn from these incidents to build resilient cloud architectures.
00:00 Welcome: This is The Internet Report, where we uncover what’s working and what’s breaking on the Internet—and why.
00:15 Headlines: Today we’re going to do a thorough analysis of the major Facebook outage that took place yesterday, Monday, October 4. I’m joined by Gustavo Ramos, ThousandEyes’ in-house expert on Network Engineering.
ThousandEyes Blog: https://www.thousandeyes.com/blog/facebook-outage-analysis
Analysis from Facebook: https://engineering.fb.com/2021/10/05/networking-traffic/outage-details/
1:17 Under the Hood: We'll walk through the sequence of events that led to this outage, understand what went wrong (and what actions may have made the situation worse), and what lessons we can all learn from this outage.
25:40 Outro: We've been on a bit of a break for the past several months as things were relatively quiet on the Internet front and for the foreseeable future we'll be a bit reactive in our episodes, when something major happens trust we'll be here. Questions? Feedback? Have an idea for a guest? Send us an email at internetreport@thousandeyes.com
00:00 Welcome: This is The Internet Report, where we uncover what’s working and what’s breaking on the Internet—and why.
00:08 Headlines: Today, Mike Hicks (Principal Solutions Analyst, ThousandEyes) and I discuss a recent BGP routing incident that had intermittent impacts on Amazon’s services, including Amazon.com and AWS compute resources, during a five-hour period on July 12.
01:04 Under the Hood: When we look into BGP routing at the time, we can see multiple BGP path changes due to a service provider erroneously inserting themselves into the path for a large number of Amazon routes. Watch this episode to see how the BGP incident led to significant packet loss, resulting in service disruption for some Amazon and AWS users. We also discuss why enterprises need to have continuous oversight of the paths their traffic takes over the Internet.
17:58 Outro: Questions? Feedback? Have an idea for a guest? Send us an email at internetreport@thousandeyes.com
This is The Internet Report, where we uncover what’s working and what’s breaking on the Internet—and why. I’m joined today by Mike Hicks, principal solutions analyst here at ThousandEyes, to cover the outage of Akamai’s DNS service. The outage, which occurred on July 22nd around 3:38 PM UTC (8:38AM PT), struck during the course of business hours in Europe and North America, resulting in widespread impacts to applications and services hosted within Akamai servers. The outage itself was short-lived and was resolved roughly one hour after the outage began.
In this episode, we examine the customer impact, the relationship between DNS and CDNs, and what enterprises should take away from the incident. We also discuss the question of when it might make sense to invest in DNS or CDN redundancy—and when it is, frankly, overkill. Watch this week’s episode to hear our take, and as always let us know on Twitter what you think.
00:00 Welcome:This is The Internet Report, where we uncover what’s working and what’s breaking on the Internet—and why.
00:13 Headlines: Today, Kemal and I unpack an interesting BGP incident, in which a large-scale route leak briefly altered traffic patterns across the Internet.
00:58 Under the Hood: The incident began on Thursday, June 3rd at around 10:24 UTC, and resulted in a significant spike in packet loss that was noticeable in ThousandEyes tests. While this packet loss resolved within the hour (at around 10:48 UTC), we observed some interesting routing changes during this window—as traffic was diverted to a Russian telecom provider that had not previously been in the path. Watch this episode as we explore how this network provider managed to get itself into the routing paths of many major services, and why network visibility is so important to recognize these types of incidents in which your site may still be reachable but your traffic is being sent through an unexpected network.
20:45 Outro: Questions? Feedback? Have an idea for a guest? Send us an email at internetreport@thousandeyes.com
This is The Internet Report, where we uncover what’s working and what’s breaking on the Internet—and why. I’m joined by ThousandEyes’ BGP expert, Kemal Sanjta, to review the June 16th outage of Prolexic Routed, a DDoS Mitigation Service operated by Akamai. According to a statement from Akamai, the outage was not due to a DDoS attack or system update, but instead a routing table limitation that was inadvertently exceeded.
In this episode, Kemal and I analyzed what happened and how customers of Akamai Prolexic who had automated failover mechanisms in place were able to recover more quickly than those that had to manually switch over to other providers. Watch this episode to learn more about this outage, and how different operational processes resulted in very different service outcomes.
00:00 Welcome: This is The Internet Report, where we uncover what’s working and what’s breaking on the Internet—and why.
00:12 Headlines: Today, I’m joined by Hans Ashlock, Director of Technology & Innovation at ThousandEyes, to unpack today’s major outage at Fastly, a popular CDN provider.
3:46 Under the Hood: Today, I’m joined by Hans Ashlock, Director of Technology & Innovation at ThousandEyes, to unpack today’s major outage at Fastly, a popular CDN provider. The widespread outage occurred around 9:50 UTC, about 5:50 am ET, and mostly impacted users across Europe and Asia due to the timing. he outage lasted approximately one hour until 10:50 UTC, yet residual impacts were felt beyond that. Today’s outage is a good example of the importance of having outside-in visibility not just across your app, but also to your app’s edge and all its dependent services.
39:05 Outro: Questions? Feedback? Have an idea for a guest? Send us an email at internetreport@thousandeyes.com
This is The Internet Report, where we uncover what’s working and what’s breaking on the Internet—and why. I’m joined today by Mike Hicks, Principle Solution Analyst at ThousandEyes, to cover two recent application-related outages. The first occurred on May 19th around 12:50 UTC at Coinbase—a well-known cryptocurrency exchange. Around the time that news broke saying that the Chinese government would be imposing strict regulation on cryptocurrencies, users attempting to execute transactions were unable to access the application. From the ThousandEyes platform we were able to see a drop in availability around this time as well as increased load times (which in some cases resulted in timeout errors).
The second outage happened on May 20th around 17:35 UTC at Slack—an enterprise collaboration platform. While the outage was resolved within 90 minutes, it occurred during normal US business hours, making it particularly disruptive to users attempting to reach the application. These instances remind us that applications, much like the underlying networks they run on, can experience outages, and effective troubleshooting requires end-to-end visibility into both.
00:00 Welcome
00:14 Headlines: DNS and BGP and DDoS Attacks—Oh, My! This week we cover a couple of recent service degradation incidents involving DNS providers
2:19 Under the Hood: Kemal Sanjta, ThousandEyes’ resident BGP expert, joins us to discuss the May 6th disruption to Neustar’s UltraDNS service, which lasted nearly four hours. We discuss the BGP routing changes we observed during the incident and what they can tell us about the cause of the disruption. We also cover a separate incident involving Quad 9, a public recursive resolver service, which the company says was caused by a DDoS attack on May 3rd.
16:19 Expert Spotlight: Michael Batchelder (A.K.A., Binky), is here to discuss the two “Ds” of the Internet — DDoS attacks and the DNS Questions for Binky? Contact him at binky@thousandeyes.com
31:49 Outro: Questions? Feedback? Have an idea for a guest? Send us an email at internetreport@thousandeyes.com
This is The Internet Report, where we uncover what’s working and what’s breaking on the Internet—and why. Today, we focused on an interesting outage that impacted Cloudflare Magic Transit, a relatively new offering from the CDN provider which aims to efficiently route and protect the network traffic of its customers. On May 3rd at approximately 3:00 PM PDT (10:00 PM UTC), ThousandEyes vantage points connecting to sites using Magic Transit began to detect significant packet loss at Cloudflare’s network edge—with the loss continuing at varying levels, for approximately 2 hours.
While the outage impacted some Magic Transit customers more significantly than others, we also observed mitigation actions by at least one customer to avoid the outage and restore the availability of their service to their users. This outage reminds us that no provider is immune to outages, even cloud and global CDN providers. However, with proactive visibility, you can respond quickly to reduce outage impact on your users. Watch this week’s episode to hear more about the outage from the ThousandEyes perspective.
This is The Internet Report, where we uncover what’s working and what’s breaking on the Internet—and why. We’re joined this week by Hans Ashlock, Director of Technology & Innovation at ThousandEyes, to discuss Tuesday’s Microsoft Teams outage. On Tuesday, April 27th, ThousandEyes tests began to detect an outage affecting the Teams service starting around 3 AM (PT) and lasting approximately 1.5 hours. While the outage occurred in the overnight hours for much of the Americas, the global nature of the outage resulted in service disruption for users connecting from Asia and Europe.
Transaction views within the ThousandEyes platform show that Microsoft’s authentication service appeared to be available, however, the Teams application was unable to initialize, resulting in error responses. Watch this week’s episode to hear more about what ThousandEyes revealed about the nature of this outage—and what we can all learn from the incident.
This is The Internet Report, where we uncover what’s working and what’s breaking on the Internet—and why. On today’s episode, we’re thrilled to be joined by Kemal Sanjta, ThousandEyes’ resident expert on BGP. This week, we’re going under the hood on the April 16th BGP leak at Vodafone India, which leaked more than 30,000 prefixes, causing a major disruption of Internet traffic to some services. While some news outlets reported that the incident lasted approximately 10 minutes (starting around 1:50AM UTC or 9:50AM ET), we found that it lasted quite a bit longer—more than an hour in the case of some prefixes. Watch this week’s show to see how it impacted a major CDN provider.
This is The Internet Report, where we uncover what’s working and what’s breaking on the Internet—and why. We’re back from a short sabbatical to cover an interesting outage at Facebook in what appears to be an application outage compounded by a series of routing issues. On April 8th, for roughly 40 minutes, the Facebook application became unavailable for users around the globe who were attempting to connect to the service. Despite the short-lived nature of the outage, we observed prolonged performance degradation even after the application came back online for users. Suboptimal page load and response times, both of which can impact the user experience, were observed alongside a series of routing changes. This outage reminds us all of the importance of having visibility across network and application layers when troubleshooting and prioritizing issues that are impacting user experience. Catch this week’s episode to hear about the outage from ThousandEyes perspective.
On today’s episode, we discuss the recent outage on Verizon’s network that had widespread impacts on users in the US. ThousandEyes Broadband Agents detected an outage starting around 11:30am EST that manifested as packet loss across multiple locations concentrated along Verizon backbone in the US east coast and midwest. While the outage was resolved approximately an hour later, users connecting from the Verizon network across the US experienced varying degrees of impact, depending on the services they were connecting to. This serves as yet another reminder that the context around an outage directly affects the scope of the disruption. Watch this week’s episode to see what this outage looked like from ThousandEyes vantage points.