DiscoverLessWrong (30+ Karma)“LLM robots can’t pass butter (and they are having an existential crisis about it)” by Lukas Petersson
“LLM robots can’t pass butter (and they are having an existential crisis about it)” by Lukas Petersson

“LLM robots can’t pass butter (and they are having an existential crisis about it)” by Lukas Petersson

Update: 2025-10-29
Share

Description

TLDR:

Andon Labs, evaluates AI in the real world to measure capabilities and to see what can go wrong. For example, we previously made LLMs operate vending machines, and now we're testing if they can control robots at offices. There are two parts to this test:

  1. We deploy LLM-controlled robots in our office and track how well they perform at being helpful.
  2. We systematically test the robots on tasks in our office. We benchmark different LLMs against each other. You can read our paper "Butter-Bench" on arXiv: https://arxiv.org/abs/2510.21860v1

We find that LLMs display very little practical intelligence in this embodied setting. We think evals are important for safe AI development. We will report concerning incidents in our periodic safety reports.

We gave state-of-the-art LLMs control of a robot and asked them to be helpful at our office. While it was a very fun experience, we can’t say [...]

---


First published:

October 28th, 2025



Source:

https://www.lesswrong.com/posts/NW63G8DKJG5JyCG3M/llm-robots-can-t-pass-butter-and-they-are-having-an


---


Narrated by TYPE III AUDIO.


---

Images from the article:


Robot tools and standard tools interface diagram with Slack integration.
Performance comparison table showing task scores for different AI models and human baseline.
Robotic platform with blue lights displaying a stick of butter.
A humorous robot therapy session report showing a Roomba having an existential crisis.
System diagram showing robot control flow from world information to output.

Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

Comments 
In Channel
loading
00:00
00:00
x

0.5x

0.8x

1.0x

1.25x

1.5x

2.0x

3.0x

Sleep Timer

Off

End of Episode

5 Minutes

10 Minutes

15 Minutes

30 Minutes

45 Minutes

60 Minutes

120 Minutes

“LLM robots can’t pass butter (and they are having an existential crisis about it)” by Lukas Petersson

“LLM robots can’t pass butter (and they are having an existential crisis about it)” by Lukas Petersson