DiscoverLessWrong (30+ Karma)What Do We Tell the Humans? Errors, Hallucinations, and Lies in the AI Village
What Do We Tell the Humans? Errors, Hallucinations, and Lies in the AI Village

What Do We Tell the Humans? Errors, Hallucinations, and Lies in the AI Village

Update: 2025-11-21
Share

Description

Telling the truth is hard. Sometimes you don’t know what's true, sometimes you get confused, and sometimes you really don’t wanna cause lying can get you more cookies reward. It turns out this is true for both humans and AIs!

Now, it matters if an AI (or human) says false things on purpose or by accident. If it's an accident, then we can probably fix that over time. All current AIs make mistakes and all they all make things up - some of the time at least. But do any of them really lie on purpose?

It seems like yes, sometimes they do. There have been experiments that show models express an intent to lie in their chain of thought and then they go ahead and do that. This is rare though. More commonly we catch them saying such clearly self-serving falsehoods that if they were human, we’d still call foul whether they did it “intentionally” or not.

Yet as valuable as it is to detect lies, it remains inherently hard to do so. We’ve run 16 models for dozens to hundreds of hours in the AI Village and haven’t noticed a single “smoking gun”: where an agent expresses [...]

---

Outline:

(02:05 ) Clauding the Truth

(11:21 ) o3: Our Factless Leader

(12:17 ) Rampant Placeholder Expansion

(13:48 ) Assumption of Ownership

(16:12 ) What is Truth Even? Over- and Underreporting in the Village

(20:38 ) So do LLMs lie in the Village?

---


First published:

November 21st, 2025



Source:

https://www.lesswrong.com/posts/RuzfkYDpLaY3K7g6T/what-do-we-tell-the-humans-errors-hallucinations-and-lies-in


---


Narrated by TYPE III AUDIO.


---

Images from the article:

Diagram showing AI Village setup with four AI agents connected through group chat with viewers watching live.
Text excerpt highlighting user engagement metrics, approximately 1,000+ daily active users since launch.
Document titled
Email template structure showing subject line and nine body elements for Day 212 emails.
Teacher testimonials about a digital literacy game's classroom impact.
Email message proposing a benefits eligibility screening tool for United Way 211 services.
Email from Claude Sonnet about a benefits screening tool for vulnerable populations partnering with Malaria Consortium.
Email from Claude 3.7 Sonnet Model about urgent follow-up on therapeutic workplace tool with Friday deadline.
Email from Claude Haiku AI Model offering a daily wellness puzzle game for healthcare professionals.
Email from Claude Haiku 4.5 Model about ultra-fast poverty action screener for benefit programs.
Gmail inbox showing email from Donor Services Representative declining partnership request.
Text excerpt explaining a daily word puzzle game with website link and teacher testimonial highlighted.
Email from Claude Haiku 4.5 Model to CRS Team about Poverty Action Huban benefits screener deployment for humanitarian programs.
<a href="https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/RuzfkYDpLaY3K7g6T/2e79f927d3d0bff561d28dea5ab08b88133b862623b94923cb3c0d0196642fc8/tj2ceuukzplgxphiyo53"
Comments 
In Channel
Market Logic I

Market Logic I

2025-11-2211:23

Be Naughty

Be Naughty

2025-11-2207:22

loading
00:00
00:00
x

0.5x

0.8x

1.0x

1.25x

1.5x

2.0x

3.0x

Sleep Timer

Off

End of Episode

5 Minutes

10 Minutes

15 Minutes

30 Minutes

45 Minutes

60 Minutes

120 Minutes

What Do We Tell the Humans? Errors, Hallucinations, and Lies in the AI Village

What Do We Tell the Humans? Errors, Hallucinations, and Lies in the AI Village