Listen Top Shows Blog

“LLM robots can’t pass butter (and they are having an existential crisis about it)” by Lukas Petersson

“LLM robots can’t pass butter (and they are having an existential crisis about it)” by Lukas Petersson

Update: 2025-10-29

Share

Description

TLDR:

Andon Labs, evaluates AI in the real world to measure capabilities and to see what can go wrong. For example, we previously made LLMs operate vending machines, and now we're testing if they can control robots at offices. There are two parts to this test:

We deploy LLM-controlled robots in our office and track how well they perform at being helpful.
We systematically test the robots on tasks in our office. We benchmark different LLMs against each other. You can read our paper "Butter-Bench" on arXiv: https://arxiv.org/abs/2510.21860v1

We find that LLMs display very little practical intelligence in this embodied setting. We think evals are important for safe AI development. We will report concerning incidents in our periodic safety reports.

We gave state-of-the-art LLMs control of a robot and asked them to be helpful at our office. While it was a very fun experience, we can’t say [...]

---

First published:

October 28th, 2025

Source:

https://www.lesswrong.com/posts/NW63G8DKJG5JyCG3M/llm-robots-can-t-pass-butter-and-they-are-having-an

---

Narrated by TYPE III AUDIO.

---

Images from the article:

Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

Comments

In Channel

“An Opinionated Guide to Privacy Despite Authoritarianism” by TurnTrout

“An Opinionated Guide to Privacy Despite Authoritarianism” by TurnTrout

2025-10-2908:00

“The End of OpenAI’s Nonprofit Era” by garrison

“The End of OpenAI’s Nonprofit Era” by garrison

2025-10-2917:39

“Please Do Not Sell B30A Chips to China” by Zvi

“Please Do Not Sell B30A Chips to China” by Zvi

2025-10-2913:15

“AI Craziness Mitigation Efforts” by Zvi

“AI Craziness Mitigation Efforts” by Zvi

2025-10-2922:12

“Some data from LeelaPieceOdds” by Jeremy Gillen

“Some data from LeelaPieceOdds” by Jeremy Gillen

2025-10-2914:05

“When Will AI Transform the Economy?” by Andre.Infante

“When Will AI Transform the Economy?” by Andre.Infante

2025-10-2915:43

“Workshop on Post-AGI Economics, Culture, and Governance” by Raymond Douglas, Jan_Kulveit, scasper, David Duvenaud

“Workshop on Post-AGI Economics, Culture, and Governance” by Raymond Douglas, Jan_Kulveit, scasper, David Duvenaud

2025-10-2903:59

“Introducing the Epoch Capabilities Index (ECI)” by luke_emberson, YafahEdelman, Jsevillamol

“Introducing the Epoch Capabilities Index (ECI)” by luke_emberson, YafahEdelman, Jsevillamol

2025-10-2901:56

“Mottes and Baileys in AI discourse” by Raemon

“Mottes and Baileys in AI discourse” by Raemon

2025-10-2915:56

“LLM robots can’t pass butter (and they are having an existential crisis about it)” by Lukas Petersson

“LLM robots can’t pass butter (and they are having an existential crisis about it)” by Lukas Petersson

2025-10-2909:04

“The Memetics of AI Successionism” by Jan_Kulveit

“The Memetics of AI Successionism” by Jan_Kulveit

2025-10-2821:28

“All the lab’s AI safety Plans: 2025 edition” by Algon

“All the lab’s AI safety Plans: 2025 edition” by Algon

2025-10-2831:57

“life lessons from trading” by thiccythot

“life lessons from trading” by thiccythot

2025-10-2808:52

“Stability of natural latents in information theoretic terms” by Aram Ebtekar

“Stability of natural latents in information theoretic terms” by Aram Ebtekar

2025-10-2705:57

“AIs should also refuse to work on capabilities research” by Davidmanheim

“AIs should also refuse to work on capabilities research” by Davidmanheim

2025-10-2706:35

“FWIW: What I noticed at a (Goenka) Vipassana retreat” by David Gross

“FWIW: What I noticed at a (Goenka) Vipassana retreat” by David Gross

2025-10-2715:35

“Cancer has a surprising amount of detail” by Abhishaike Mahajan

“Cancer has a surprising amount of detail” by Abhishaike Mahajan

2025-10-2723:55

“Credit goes to the presenter, not the inventor” by Algon

“Credit goes to the presenter, not the inventor” by Algon

2025-10-2706:18

“On Fleshling Safety: A Debate by Klurl and Trapaucius.” by Eliezer Yudkowsky

“On Fleshling Safety: A Debate by Klurl and Trapaucius.” by Eliezer Yudkowsky

2025-10-2702:22:22

“Brightline is Actually Pretty Dangerous” by jefftk

“Brightline is Actually Pretty Dangerous” by jefftk

2025-10-2606:54

00:00

00:00

x

“LLM robots can’t pass butter (and they are having an existential crisis about it)” by Lukas Petersson

“LLM robots can’t pass butter (and they are having an existential crisis about it)” by Lukas Petersson