Data at scale: Shaping the AI landscape with Scale AI
Description
In this episode of Get the Check, the hosts discuss Scale AI's role in the AI ecosystem. With AI models requiring high-quality, labeled data to perform at their best, Scale AI has positioned itself at the forefront of creating quality data. The hosts explore Scale's journey, from using contractors to scale its data labeling operations to long-standing partnerships with the DoD and OpenAI.
The hosts break down the three pillars of AI—compute, data, and algorithms—and take a closer look at Scale’s history, its innovative products, and the controversies surrounding labor in data labeling. They dive into training data such as input-output pairs, reinforcement learning from human feedback (RLHF), and the workflow data needed to power the shift from generative AI to agentic AI. Additionally, they touch on Scale’s new offerings, including expert data labeling, ML ops for enterprises, and the defense-focused LLAMA model, a collaboration with Meta to power U.S. military AI capabilities.
Tune in for insights on how Scale AI is leveraging human expertise to create high-quality datasets that power everything from autonomous vehicles to defense technologies. You can follow @getthecheckpod on all socials. Stay tuned for next week’s episode on Scale AI!
00:00 Pillars of AI
01:30 What is data labeling?
04:46 Can synthetic data replace real data?
07:37 Classes of data
12:04 Founding story
18:39 Scale Donovan
22:33 Using private data for agentic AI
27:19 SEAL LLM Leaderboard
30:30 Expert Match
32:34 Hiring at Scale AI
38:02 Restaurant pick of the week