What Do We Tell the Humans? Errors, Hallucinations, and Lies in the AI Village

Update: 2025-11-21

Description

Telling the truth is hard. Sometimes you don’t know what's true, sometimes you get confused, and sometimes you really don’t wanna cause lying can get you more cookies reward. It turns out this is true for both humans and AIs!

Now, it matters if an AI (or human) says false things on purpose or by accident. If it's an accident, then we can probably fix that over time. All current AIs make mistakes and all they all make things up - some of the time at least. But do any of them really lie on purpose?

It seems like yes, sometimes they do. There have been experiments that show models express an intent to lie in their chain of thought and then they go ahead and do that. This is rare though. More commonly we catch them saying such clearly self-serving falsehoods that if they were human, we’d still call foul whether they did it “intentionally” or not.

Yet as valuable as it is to detect lies, it remains inherently hard to do so. We’ve run 16 models for dozens to hundreds of hours in the AI Village and haven’t noticed a single “smoking gun”: where an agent expresses [...]

---

Outline:

(02:05 ) Clauding the Truth

(11:21 ) o3: Our Factless Leader

(12:17 ) Rampant Placeholder Expansion

(13:48 ) Assumption of Ownership

(16:12 ) What is Truth Even? Over- and Underreporting in the Village

(20:38 ) So do LLMs lie in the Village?

---

First published:

November 21st, 2025

Source:

https://www.lesswrong.com/posts/RuzfkYDpLaY3K7g6T/what-do-we-tell-the-humans-errors-hallucinations-and-lies-in

---

Narrated by TYPE III AUDIO.

---

Images from the article:

<a href="https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/RuzfkYDpLaY3K7g6T/2e79f927d3d0bff561d28dea5ab08b88133b862623b94923cb3c0d0196642fc8/tj2ceuukzplgxphiyo53"

Comments

In Channel

“Book Review: Wizard’s Hall” by Screwtape

2025-11-2210:08

Market Logic I

2025-11-2211:23

D&D.Sci Thanksgiving: the Festival Feast

2025-11-2203:54

Be Naughty

2025-11-2207:22

Abstract advice to researchers tackling the difficult core problems of AGI alignment

2025-11-2215:29

Why Not Just Train For Interpretability?

2025-11-2208:15

“Natural emergent misalignment from reward hacking in production RL” by evhub, Monte M, Benjamin Wright, Jonathan Uesato

2025-11-2118:46

“AI #143: Everything, Everywhere, All At Once” by Zvi

2025-11-2102:03:46

“Rescuing truth in mathematics from the Liar’s Paradox using fuzzy values” by Adrià Garriga-alonso

2025-11-2122:24

Contra Collisteru: You Get About One Carthage

2025-11-2108:30

What Do We Tell the Humans? Errors, Hallucinations, and Lies in the AI Village

2025-11-2122:01

Reading My Diary: 10 Years Since CFAR

2025-11-2111:02

“Evrart Claire: A Case Study in Anti-Epistemology” by Ben Pace

2025-11-2113:23

“The Boring Part of Bell Labs” by Elizabeth

2025-11-2125:58

“[Paper] Output Supervision Can Obfuscate the CoT” by jacob_drori, lukemarks, cloud, TurnTrout

2025-11-2010:15

“Dominance: The Standard Everyday Solution To Akrasia” by johnswentworth

2025-11-2003:29

Gemini 3 is Evaluation-Paranoid and Contaminated

2025-11-2015:00

“Thinking about reasoning models made me less worried about scheming” by Fabien Roger

2025-11-2022:16

“What Is The Basin Of Convergence For Kelly Betting?” by johnswentworth

2025-11-2006:21

“In Defense of Goodness” by abramdemski

2025-11-2006:45

00:00

What Do We Tell the Humans? Errors, Hallucinations, and Lies in the AI Village

#box-pro-ellipsis-176387339921269{-webkit-line-clamp:2;}What Do We Tell the Humans? Errors, Hallucinations, and Lies in the AI Village

What Do We Tell the Humans? Errors, Hallucinations, and Lies in the AI Village

What Do We Tell the Humans? Errors, Hallucinations, and Lies in the AI Village