DiscoverHow AI Is Built#055 Embedding Intelligence: AI's Move to the Edge
#055 Embedding Intelligence: AI's Move to the Edge

#055 Embedding Intelligence: AI's Move to the Edge

Update: 2025-08-13
Share

Description

Nicolay here,

while everyone races to cloud-scale LLMs, Pete Warden is solving AI problems by going completely offline. No network connectivity required.

Today I have the chance to talk to Pete Warden, CEO of Useful Sensors and author of the TinyML book.

His philosophy: if you can't explain to users exactly what happens to their data, your privacy model is broken.

Key Insight: The Real World Action Gap

LLMs excel at text-to-text transformations but fail catastrophically at connecting language to physical actions. There's nothing in the web corpus that teaches a model how "turn on the light" maps to sending a pin high on a microcontroller.

This explains why every AI agent demo focuses on booking flights and API calls - those actions are documented in text. The moment you step off the web into real-world device control, even simple commands become impossible without custom training on action-to-outcome data.

Pete's company builds speech-to-intent systems that skip text entirely, going directly from audio to device actions using embeddings trained on limited action sets.

💡 Core Concepts

Speech-to-Intent: Direct audio-to-action mapping that bypasses text conversion, preserving ambiguity until final classification

ML Sensors: Self-contained circuit boards processing sensitive data locally, outputting only simple signals without exposing raw video/audio

Embedding-Based Action Matching: Vector representations mapping natural language variations to canonical device actions within constrained domains

⏱ Important Moments

Real World Action Problem: [06:27 ] LLMs discuss turning on lights but lack training data connecting text commands to device control

Apple Intelligence Challenges: [04:07 ] Design-led culture clashes with AI accuracy limitations

Speech-to-Intent vs Speech-to-Text: [12:01 ] Breaking audio into text loses critical ambiguity information

Limited Action Set Strategy: [15:30 ] Smart speakers succeed by constraining to ~3 functions rather than infinite commands

8-Bit Quantization: [33:12 ] Remains deployment sweet spot - processor instruction support matters more than compression

On-Device Privacy: [47:00 ] Complete local processing provides explainable guarantees vs confusing hybrid systems

🛠 Tools & Tech

Whisper: github.com/openai/whisper

Moonshine: github.com/usefulsensors/moonshine

TinyML Book: oreilly.com/library/view/tinyml/9781492052036

Stanford Edge ML: github.com/petewarden/stanford-edge-ml

📚 Resources

Looking to Listen Paper: looking-to-listen.github.io

Lottery Ticket Hypothesis: arxiv.org/abs/1803.03635

Connect: pete@usefulsensors.com | petewarden.com | usefulsensors.com

Beta Opportunity: Moonshine browser implementation for client-side speech processing in

JavaScript


Comments 
In Channel
loading
00:00
00:00
x

0.5x

0.8x

1.0x

1.25x

1.5x

2.0x

3.0x

Sleep Timer

Off

End of Episode

5 Minutes

10 Minutes

15 Minutes

30 Minutes

45 Minutes

60 Minutes

120 Minutes

#055 Embedding Intelligence: AI's Move to the Edge

#055 Embedding Intelligence: AI's Move to the Edge

Nicolay Gerold