Haize Labs with Leonard Tang - Weaviate Podcast #121!

Update: 2025-05-12

Description

How do you ensure your AI systems actually do what you expect them to do? Leonard Tang takes us deep into the revolutionary world of AI evaluation with concrete techniques you can apply today. Learn how Haize Labs is transforming AI testing through "scaling judge-time compute" - stacking weaker models to effectively evaluate stronger ones. Leonard unpacks the game-changing Verdict library that outperforms frontier models by 10-20% while dramatically reducing costs. Discover practical insights on creating contrastive evaluation sets that extract maximum signal from human feedback, implementing debate-based judging systems, and building custom reward models that align with enterprise needs. The conversation reveals powerful nuggets like using randomized agent debates to achieve consensus and lightweight guardrail models that run alongside inference. Whether you're developing AI applications or simply fascinated by how we'll ensure increasingly powerful AI systems perform as expected, this episode delivers immediate value with techniques you can implement right away, philosophical perspectives on AI safety, and a glimpse into the future of evaluation that will fundamentally shape how AI evolves.

Comments

In Channel

Semantic Query Engines with Matthew Russo - Weaviate Podcast #131!

2025-11-1801:02:25

REFRAG with Xiaoqiang Lin - Weaviate Podcast #130!

2025-11-0301:00:00

Weaviate and SAS with Saurabh Mishra and Bob van Luijt - Weaviate Podcast #129!

2025-10-1343:55

Weaviate's Query Agent with Charles Pierse - Weaviate Podcast #128!

2025-09-2201:01:32

GEPA with Lakshya A. Agrawal - Weaviate Podcast #127!

2025-08-1301:01:55

Agentic Topic Modeling with Maarten Grootendorst - Weaviate Podcast #126!

2025-07-0901:05:18

Sufficient Context with Hailey Joren - Weaviate Podcast #125!

2025-07-0250:53

RAG Benchmarks with Nandan Thakur - Weaviate Podcast #124!

2025-06-2501:04:46

MUVERA with Rajesh Jayaram and Roberto Esposito - Weaviate Podcast #123!

2025-05-2801:13:06

Patronus AI with Anand Kannappan - Weaviate Podcast #122!

2025-05-1501:01:06

Haize Labs with Leonard Tang - Weaviate Podcast #121!

2025-05-1254:15

Box AI with Ben Kus and Bob van Luijt

2025-05-0755:32

Structured Outputs with Will Kurt and Cameron Pfiffer - Weaviate Podcast #119!

2025-04-0901:10:17

Synthetic Data with David Berenstein and Ben Burtenshaw - Weaviate Podcast #118!

2025-03-2501:02:01

Letta AI with Sarah Wooders - Weaviate Podcast #117!

2025-03-0357:34

Agent Experience with Matt Biilmann, Sebastian Witalec, and Charles Pierse - Weaviate Podcast #116!

2025-02-2752:09

Optimizing Retrieval Agents with Shirley Wu - Weaviate Podcast #115!

2025-02-1901:00:20

Contextual AI with Amanpreet Singh - Weaviate Podcast #114!

2025-02-1257:56

Cartesia AI with Karan Goel - Weaviate Podcast #113!

2025-01-2853:45

Google Vertex AI RAG Engine with Lewis Liu and Bob van Luijt - Weaviate Podcast #112!

2025-01-1558:16

00:00

1.0x

Haize Labs with Leonard Tang - Weaviate Podcast #121!

#box-pro-ellipsis-176401003127294{-webkit-line-clamp:2;}Haize Labs with Leonard Tang - Weaviate Podcast #121!

Haize Labs with Leonard Tang - Weaviate Podcast #121!

Weaviate

Haize Labs with Leonard Tang - Weaviate Podcast #121!