#002: The secrets to building secure & scalable OTA infrastructure with Nick Sinas

Update: 2024-12-18

Description

In today’s Coredump Session, the team dives deep into the world of over-the-air (OTA) updates—why they matter, how they break, and what it takes to get them right. From horror stories involving IR updates in a snowstorm to best practices for deploying secure firmware across medical devices, this conversation covers the full stack of OTA: device, cloud, process, and people. It's equal parts cautionary tale and technical masterclass.

Key Takeaways:

OTA is essential for modern hardware—without it, even small bugs can require massive field operations.
Good OTA starts early, ideally at the product design and architecture phase.
Bootloaders, memory maps, and security keys must be carefully planned to avoid long-term issues.
Staged rollouts and cohorts help mitigate fleet-wide disasters.
Signing keys and root certificates should be treated like firmware—versioned, updatable, and secure.
Real-world constraints (medical, smart home, etc.) make OTA more complex—but not optional.
Testing both the update and the update mechanism itself is critical before going live.
When OTA fails, fallback plans (like dual banks or A/B slots) can be the difference between a patch and a catastrophe.

Chapters:

00:00 Episode Teasers & Intro

03:29 Meet the Guests + OTA Gut Reactions

05:33 Why OTA Is Non-Negotiable

03:29 The OTA Wake-Up Call: Why You Need It

09:31 Building OTA into Hardware from Day One

16:49 Cloud-Side OTA: Cohorts, Load, and Timing

21:53 OTA in Regulated Industries

30:10 When OTA Breaks Itself

34:44 Minimizing OTA Risk: The Defensive Playbook

41:18 OTA and the Matter Standard

47:17 Networking Stacks, Constraints, and Reliability