When AI Becomes Your SRE: How Incident.io Is Automating Incident Response
Description
Guests
- Lawrence Jones, Founding Engineer at Incident.io
- Ed Dean Product Lead for AI at Incident.io
Key Takeaways
- AI’s biggest impact comes from compressing time—identifying causes minutes instead of hours.
- Retrieval-augmented reasoning still benefits from simplicity: deterministic tagging and re-ranking often beat complex vector setups.
- Post-incident “time travel” evals let teams score AI accuracy after they know what really happened.
- Building trust in AI isn’t just about precision—it’s about showing reasoning and uncertainty in ways humans understand.
Mentioned Tools & Concepts
- Slack as the interface for human-AI collaboration
- PGVector and Postgres for retrieval experiments
- RAG (Retrieval-Augmented Generation)
- Multi-agent orchestration
- “AI as your company’s immune system”
Chapters
00:00 Meet the Founders: Lawrence and Ed
00:41 Introduction to Incident.io
01:25 Evolution of Incident.io Products
02:14 Understanding SRE and Its Importance
04:01 Real-World Incident Management
05:51 The Role of AI in Incident Management
10:12 Challenges and Innovations in AI SRE
12:14 Prototyping and Iterating AI Solutions
16:25 Refining Retrieval Strategies
21:52 Balancing AI and Human Interaction
32:06 User Experience and Trust in AI Systems
36:08 Interactive Slack Integration
37:08 Understanding the AI Investigation Process
37:50 Parallel Checks and Data Sources
38:35 Building Hypotheses and Refining Findings
40:09 Human-Agent Collaboration
49:23 Evaluating AI Effectiveness
a01:04:13 Future Developments and Integrations













