Why validity beats scale when building multi‑step AI systems

Update: 2026-01-06

Description

In this episode, Dr. Sebastian (Seb) Benthall joins us to discuss research from his and Andrew's paper entitled “Validity Is What You Need” for agentic AI that actually works in the real world.

Our discussion connects systems engineering, mechanism design, and requirements to multi‑step AI that creates enterprise impact to achieve measurable outcomes.

Defining agentic AI beyond LLM hype
Limits of scale and the need for multi‑step control
Tool use, compounding errors, and guardrails
Systems engineering patterns for AI reliability
Principal–agent framing for governance
Mechanism design for multi‑stakeholder alignment
Requirements engineering as the crux of validity
Hybrid stacks: LLM interface, deterministic solvers
Regression testing through model swaps and drift
Moving from universal copilots to fit‑for‑purpose agents

You can also catch more of Seb's research on our podcast. Tune in to Contextual integrity and differential privacy: Theory versus application.

What did you think? Let us know.

Do you have a question or a discussion topic for the AI Fundamentalists? Connect with them to comment on your favorite topics:

LinkedIn - Episode summaries, shares of cited articles, and more.
YouTube - Was it something that we said? Good. Share your favorite quotes.
Visit our page - see past episodes and submit your feedback! It continues to inspire future episodes.

Comments

In Channel

Why validity beats scale when building multi‑step AI systems

2026-01-0640:16

2025 AI review: Why LLMs stalled and the outlook for 2026

2025-12-2242:06

Big data, small data, and AI oversight with David Sandberg

2025-12-0949:48

Metaphysics and modern AI: What is space and time?

2025-11-1138:04

Metaphysics and modern AI: What is reality?

2025-10-2738:32

Metaphysics and modern AI: What is thinking? - Series Intro

2025-10-0716:19

AI in practice: Guardrails and security for LLMs

2025-09-3035:11

AI in practice: LLMs, psychology research, and mental health

2025-09-0442:28

LLM scaling: Is GPT-5 near the end of exponential growth?

2025-08-1922:42

AI governance: Building smarter AI agents from the fundamentals, part 4

2025-07-2237:25

Linear programming: Building smarter AI agents from the fundamentals, part 3

2025-07-0829:46

Utility functions: Building smarter AI agents from the fundamentals, part 2

2025-06-1241:36

Mechanism design: Building smarter AI agents from the fundamentals, Part 1

2025-05-2037:06

Principles, agents, and the chain of accountability in AI systems

2025-05-0846:26

Supervised machine learning for science with Christoph Molnar and Timo Freiesleben, Part 2

2025-03-2741:58

Supervised machine learning for science with Christoph Molnar and Timo Freiesleben, Part 1

2025-03-2527:29

The future of AI: Exploring modeling paradigms

2025-02-2533:42

Agentic AI: Here we go again

2025-02-0130:21

Contextual integrity and differential privacy: Theory vs. application with Sebastian Benthall

2025-01-0732:32

Model documentation: Beyond model cards and system cards in AI governance

2024-11-0927:43

00:00

Why validity beats scale when building multi‑step AI systems

Dr. Andrew Clark & Dr. Sid Mangalik

#box-pro-ellipsis-176783393471362{-webkit-line-clamp:2;}Why validity beats scale when building multi‑step AI systems

Why validity beats scale when building multi‑step AI systems

Dr. Andrew Clark & Dr. Sid Mangalik

Why validity beats scale when building multi‑step AI systems