#002: The secrets to building secure & scalable OTA infrastructure with Nick Sinas
Description
In today’s Coredump Session, the team dives deep into the world of over-the-air (OTA) updates—why they matter, how they break, and what it takes to get them right. From horror stories involving IR updates in a snowstorm to best practices for deploying secure firmware across medical devices, this conversation covers the full stack of OTA: device, cloud, process, and people. It's equal parts cautionary tale and technical masterclass.
Key Takeaways:
- OTA is essential for modern hardware—without it, even small bugs can require massive field operations.
- Good OTA starts early, ideally at the product design and architecture phase.
- Bootloaders, memory maps, and security keys must be carefully planned to avoid long-term issues.
- Staged rollouts and cohorts help mitigate fleet-wide disasters.
- Signing keys and root certificates should be treated like firmware—versioned, updatable, and secure.
- Real-world constraints (medical, smart home, etc.) make OTA more complex—but not optional.
- Testing both the update and the update mechanism itself is critical before going live.
- When OTA fails, fallback plans (like dual banks or A/B slots) can be the difference between a patch and a catastrophe.
Chapters:
00:00 Episode Teasers & Intro
03:29 Meet the Guests + OTA Gut Reactions
05:33 Why OTA Is Non-Negotiable
03:29 The OTA Wake-Up Call: Why You Need It
09:31 Building OTA into Hardware from Day One
16:49 Cloud-Side OTA: Cohorts, Load, and Timing
21:53 OTA in Regulated Industries
30:10 When OTA Breaks Itself
34:44 Minimizing OTA Risk: The Defensive Playbook
41:18 OTA and the Matter Standard
47:17 Networking Stacks, Constraints, and Reliability
51:11 Security, Scale, and the OTA Future
Follow Memfault
Other ways to listen: