Listen Top Shows Blog

Metrics Driven Development

Metrics Driven Development

Update: 2024-08-29

1

Share

Description

How do you systematically measure, optimize, and improve the performance of LLM applications (like those powered by RAG or tool use)? Ragas is an open source effort that has been trying to answer this question comprehensively, and they are promoting a “Metrics Driven Development” approach. Shahul from Ragas joins us to discuss Ragas in this episode, and we dig into specific metrics, the difference between benchmarking models and evaluating LLM apps, generating synthetic test data and more.

Join the discussion

Changelog++ members save 5 minutes on this episode because they made the ads disappear. Join today!

Sponsors:

Assembly AI – Turn voice data into summaries with AssemblyAI’s leading Speech AI models. Built by AI experts, their Speech AI models include accurate speech-to-text for voice data (such as calls, virtual meetings, and podcasts), speaker detection, sentiment analysis, chapter detection, PII redaction, and more.

Featuring:

Shahul Es – GitHub, LinkedIn, X
Daniel Whitenack – Website, GitHub, X

Show Notes:

Ragas

Something missing or broken? PRs welcome!

Comments

In Channel

Technical advances in document understanding

Technical advances in document understanding

2025-12-0249:18

Chris on AI, autonomous swarming, home automation and Rust!

Chris on AI, autonomous swarming, home automation and Rust!

2025-11-2601:37:09

Beyond note-taking with Fireflies

Beyond note-taking with Fireflies

2025-11-1948:59

Autonomous Vehicle Research at Waymo

Autonomous Vehicle Research at Waymo

2025-11-1352:08

Are we in an AI bubble?

Are we in an AI bubble?

2025-11-1049:41

While loops with tool calls

While loops with tool calls

2025-10-3044:45

Tiny Recursive Networks

Tiny Recursive Networks

2025-10-2448:23

Dealing with increasingly complicated agents

Dealing with increasingly complicated agents

2025-10-1654:56

The impact of AI on the workforce: A state-level case study

The impact of AI on the workforce: A state-level case study

2025-10-0944:04

We've all done RAG, now what?

We've all done RAG, now what?

2025-09-2943:35

Creating a private AI assistant in Thunderbird

Creating a private AI assistant in Thunderbird

2025-09-2353:08

Cracking the code of failed AI pilots

Cracking the code of failed AI pilots

2025-09-1146:44

GenAI risks and global adoption

GenAI risks and global adoption

2025-08-2743:20

Inside America’s AI Action Plan

Inside America’s AI Action Plan

2025-08-1943:52

Confident, strategic AI leadership

Confident, strategic AI leadership

2025-08-1247:40

Educating a data-literate generation

Educating a data-literate generation

2025-08-0844:41

Workforce dynamics in an AI-assisted world

Workforce dynamics in an AI-assisted world

2025-08-0144:06

Reimagining actuarial science with AI

Reimagining actuarial science with AI

2025-07-2540:59

Agentic AI for Drone & Robotic Swarming

Agentic AI for Drone & Robotic Swarming

2025-07-1546:27

AI in the shadows: From hallucinations to blackmail

AI in the shadows: From hallucinations to blackmail

2025-07-0744:50

00:00

00:00

x

Metrics Driven Development

Metrics Driven Development

Practical AI LLC