We’re running a short mini-series on The Debrief podcast called Beyond the code, where we interview our engineers about what it’s really like to build at incident.io.In this episode, Product Engineer Rory B. and CTO Pete discuss how we’re using Claude Code and Git Worktrees to allow engineers to build multiple features in parallel. You can read more on our blog.
We’re running a short mini-series on The Debrief podcast called Beyond the code, where we interview our engineers about what it’s really like to build at incident.io.In this episode, we chat with Product Engineer Leo about how we’re using AI tools like Claude Code to ship more product, more quickly.
We’re running a short mini-series on The Debrief podcast called Beyond the code, where we interview our engineers about what it’s really like to build atincident.io. In this episode, we chat with Product Engineer Leo about her time building On-call, our favorite engineering tooling, and what makes our engineering culture as good as cinnamon buns.
We’re running a short mini-series on The Debrief podcast called Beyond the code, where we interview our engineers about what it’s really like to build at incident.io.In this episode, Norberto Lopes and Rory Malcolm discuss Rory's journey as a product engineer at incident.io, focusing on his experiences in the AI team and the challenges of developing the AI investigations product. They explore the engineering culture at incident.io and the impact of AI on incident management. The discussion also touches on the future of incident management and the evolving role of AI in a tech environment.
We’re running a short mini-series on The Debrief podcast called Beyond the Code, where we interview our engineers about what it’s really like to build at incident.io. In this episode, Alicia chats with Kelsey, a long-time product engineer on the Response team. They dive into Kelsey’s journey from lab automation to building core incident features, the magic (and madness) behind Scribe, and what it’s like working on a high-performing team. Expect stories about AI, workarounds, and why versioning folders manually isn’t a scalable solution.
Join us for a deep dive into how incident.io is leveraging AI to build an intelligent incident investigator. Our guests, Ed and Lawrence, share insights on building AI-powered investigations that help teams to leverage huge amounts of data and signals to respond faster and more effectively.
In this episode, Stephen, Pete and Chris take a look back at 2024 at incident.io — reflecting on the year’s personal milestones, company-wide changes, and how the product has evolved along the way. And as is customary, there's plenty of the usual good-natured humor along the way too.
In this episode, hosts Norberto and Lawrence discuss the recent CrowdStrike incident that began on July 19th. You won't find any backseat commentary on the technical specifics, but instead a deep dive into the things we care about incident.io, like communication, their over response and proactive problem-solving during crises.
This week, we're talking to Sabin Roman, engineering manager at Linear, to talk about processes that sit behind building their product. We cover how they build teams around planned work, how their "goalie" role works to protect teams from unplanned work, the zero bugs policy they've introduced and how they ensure everyone at Linear sweats the details on their product.
This week we sit down with Hank Jacobs, Staff Site Reliability Engineer at Netflix to discuss their deployment of incident.io across their organization. Among other things, we discuss how great UX has allowed them to roll out to hundreds of teams in months, how they have more entries in their Catalog than any other incident.io customer, and how their partnership with incident.io has been an overall game changer.
During a recent episode of The Debrief, we spoke with Jeff Forde, Architect on the Platform Engineering team at Collectors, about building an incident management program at various stages of growth. In that episode, we called it growth from zero to one, one to two, and two to three. But what happens once you’ve scaled beyond three and answers to question you may have become that much harder to find. To get to the bottom of this, we chatted with Oliver Tappin, Director of SRE at Eagle Eye, about what to do once your company has reached a point where there’s no precedent or roadmap, and you can’t necessarily look to others for answers.
This week, we have a really fun conversation lined up. For this episode, we chatted with Toby Jackson, Global SRE Team Lead at Future, about why it’s a bad idea to take a cookie-cutter approach to incident management or, put another way, why it’s not a good idea to treat all incidents alike. In our conversation, we discuss what’s wrong with this approach, some situations where this might actually make sense, how psychological safety factors into this conversation, and a whole lot more.
This week, we're sharing an extra special episode. It's no secret that the decision to buy or build isn't exactly a straightforward one. And the decision you make can be influenced by a ton of factors. But the fact is that in some instances, buying can make more sense than building, and in others, building can make more sense than buying. In this episode, you'll hear from John Paris, Principal Engineer at Skyscanner, to get the story behind their build versus buy journey. Joining him as the host for this episode is none other than the CPO of incident.io, Chris Evans. In their conversation, Chris and John discuss Skyscanner's setup before adopting incident.io, what life has been like after adopting the platform, and a whole lot more.
It’s fair to say that AI is here to stay. So, as companies grapple with this reality, they’re putting their best foot forward to build AI features that really make a difference for their customers. But should you be building these features if there’s no obvious fit in your product? And even if there is, are you making sure to stay true to your product principles? The reality is that deciding to build AI into your product isn’t a decision you make on a whim. There are tons of considerations around how to do it right—many of which we wrestled with ourselves when we were building our AI features just a few months ago. So, in this episode of The Debrief, we sat down with our CTO, Pete Hamilton, and Product Manager, Ed Dean, to get some perspective on how we weighed the decision to build with AI and how we thought about principles along the way.
It’s no secret that teamwork is one of those things that, when done right, can make a world of a difference. So sometimes, when responding to a particularly complicated incident, it can be best to bring a team together to figure out what’s going on and work towards a fix. But it’s not enough to just jam a bunch of folks into a room and hope for the best. You need a framework in place to ensure that everyone stays focused, diagnoses the issue and resolves it as quickly as possible. And for SRE, Dan Slimmon, clinical troubleshooting is just the framework to help with this. In this episode, we chat with Dan about this approach to collaboration and why, he thinks, it can help teams resolve issues much faster. In our conversation we discuss what the benefits of clinical troubleshooting are, why teams get tripped up on collaboration in the first place, what firefighting and incident response have in common and a lot more.
Whether you’re a seasoned vet when it comes to incident response, or just getting started out, it can be easy to fall into the trap of doing too much all at once. And it just makes sense. Incident response is one of those things that doesn’t have a single, perfect formula, so teams can be left doing a little bit of everything in an effort to get it right. That said there are some fundamentals that, regardless of how mature your organization is, can be a great launching off point to better incident response. And that’s exactly what we’ll be talking about in today’s episode of the Debrief. This time around, we’re joined by Viktor Stanchev of Anchorage Digital, to chat about actionable advice for responding to incidents—from declaration to post-mortem. We cover what having a good incident response even means, why it’s important to declare incidents early, how to better communicate during incidents and a whole lot more. If you’ve been looking for practical advice for running incidents from a veteran in the space, you’re in the right place.
In last week’s episode of The Debrief, we had on Colette Alexander, Director of Engineering at HashiCorp, to discuss some of the myths around incident response. In that conversation, one of the myths we spoke about was the idea that asking “why” is better than asking “how.” And how, in reality, asking "how" allows you to focus more on the contributing factors that led to an incident happening, whereas “why” tends to single out a person, which can lead to a lot of blame. For this episode, we’re diving a bit deeper into the reasons “how” is not only better for learning, it’s also better for the psychological safety of your team. This time around, we’re joined by Dennis Henry who currently works on the Architecture team at Okta. Dennis is a big believer in psychological safety and learning from incidents, so he’s just the person to shed light on this fascinating topic.
What if we told you that everything you thought you knew about incident response was wrong. Well, at least some of it. That some of the things you’ve been doing for years might not actually be having the impact you thought they did. Or, even worse, that some of the assumptions you’ve been making have actually been having a negative impact on you, your team and your organization. This week, we’re talking about myths around incident response. And who better to dispel some of these myths than Director of Engineering at HashiCorp, Colette Alexander. We chat about myths around learning and process, why “why” is the wrong question to be asking after incidents, and why documenting risk doesn’t necessarily help you manage them.
Whether you’re a seasoned company with 10+ years of operations, or a startup that’s just getting off the ground, making sure you have a good culture of engineering is really important. Not only will this have a significant impact on the folks on your team, it’ll make a big difference with hiring. When everyone knows that your company is the place to be when it comes to culture, attracting really good talent becomes that much easier. But I was curious, what do some of the folks at incident.io think about engineering culture in general and how to best build it? Better yet, what about the engineering culture at incident.io? What’s it like? To answer all of these questions and more, I sat down with Lisa Karlin Curtis, Tech Lead, and Alicia Collymore, Engineering Manager, to get their perspectives on this incredibly important topic. We chat about what “culture” even means, why diversity is important, how teams can make sure their engineers feel empowered to share their perspectives and a whole lot more.
Q1 2024 is officially behind us. So we figured that it was a great time for a bit of reflection on the exciting start to the year. In this episode, we sit down with our founders, Stephen, Chris, and Pete, to get a bit of perspective on how the last three months played out. We chat about On-call, our AI launch, and the hundreds of other features, bug fixes, and bits of polish and delight that we've shipped over the last 12 weeks. We also chat about the state of the company as a whole, our growth, and ultimately what's on the horizon.