AI Evals & Discovery

Update: 2025-09-23

Description

What you’ll learn in this episode:

What “evals” actually mean in the AI/ML world

Why evals are more than just quality assurance

The difference between golden datasets, synthetic data, and real-world traces

How to identify error modes and turn them into evals

When to use code-based evals vs. LLM-as-judge evals

How discovery practices inform every step of AI product evaluation

Why evals require continuous maintenance (and what “criteria drift” means for your product)

The relationship between evals, guardrails, and ongoing human oversight

Resources & Links:

Follow Teresa Torres: https://ProductTalk.org

Follow Petra Wille: https://Petra-Wille.com

Mentioned in the episode:

How I Designed & Implemented Evals for Product Talk’s Interview Coach by Teresa Torres
Teresa’s - Interview Coach

ML (Machine learning)

Story-Based Customer Interviews - On Demand course by Teresa

LLM (Large language model)

AI Evals for Engineers and PMs course (get 35% off through Teresa’s link) on Maven

JSON (JavaScript Object Notation)

Anthropic

The Product Leadership Wheel - A Framework for Defining and Growing Product Leadership at Scale by Petra Wille

Lovable

Behind the Scenes: Building the Product Talk Interview Coach by Teresa

Previous episode: - Building AI Products

Coming soon from Teresa:

Weekly Monday posts sharing lessons learned while building AI products

A new podcast interviewing cross-functional teams about real-world AI product development stories

Comments

In Channel

End of Year Reflection

2025-12-1625:52

Role of Leadership in Transformations

2025-12-0915:47

Customer Interview Analysis

2025-12-0223:23

Communities of Practice

2025-11-2515:05

Dealing With Setbacks

2025-11-1814:11

Global Invoicing

2025-11-1109:37

AI At Home And Work

2025-11-0423:55

Context Is King

2025-10-2816:55

Moments That Changed Us

2025-10-2126:35

Product & Leadership Legacy

2025-10-1417:11

Deliberate Practice

2025-10-0712:55

AI as a Strategic Thought Partner with UX Implications

2025-09-3027:46

AI Evals & Discovery

2025-09-2326:45

Building AI Products

2025-09-1621:16

Stop Chasing Promotions

2025-09-0912:06

Curation

2025-09-0218:20

Summer Break

2025-07-2202:10

Go to the Source

2025-07-1521:28

Red Flags When Choosing A Coach

2025-07-0824:13

Funding Projects Vs. Teams

2025-07-0121:28

00:00

#box-pro-ellipsis-17665318718682{-webkit-line-clamp:2;}AI Evals & Discovery

AI Evals & Discovery

Episodes

AI Evals & Discovery