“o1 is a bad idea” by abramdemski

Update: 2024-11-12

Description

This post comes a bit late with respect to the news cycle, but I argued in a recent interview that o1 is an unfortunate twist on LLM technologies, making them particularly unsafe compared to what we might otherwise have expected:

The basic argument is that the technology behind o1 doubles down on a reinforcement learning paradigm, which puts us closer to the world where we have to get the value specification exactly right in order to avert catastrophic outcomes.

RLHF is just barely RL.

- Andrej Karpathy

Additionally, this technology takes us further from interpretability. If you ask GPT4 to produce a chain-of-thought (with prompts such as "reason step-by-step to arrive at an answer"), you know that in some sense, the natural-language reasoning you see in the output is how it arrived at the answer.[1] This is not true of systems like o1. The o1 training rewards [...]

The original text contained 1 footnote which was omitted from this narration.

---

First published:
November 11th, 2024

Source:
https://www.lesswrong.com/posts/BEFbC8sLkur7DGCYB/o1-is-a-bad-idea

---

Narrated by TYPE III AUDIO.

Comments

Top Podcasts

The Best New Comedy Podcast Right Now – June 2024 The Best News Podcast Right Now – June 2024 The Best New Business Podcast Right Now – June 2024 The Best New Sports Podcast Right Now – June 2024 The Best New True Crime Podcast Right Now – June 2024 The Best New Joe Rogan Experience Podcast Right Now – June 20 The Best New Dan Bongino Show Podcast Right Now – June 20 The Best New Mark Levin Podcast – June 2024

In Channel

“Murder plots are infohazards” by Chris Monteiro

2025-02-1403:58

“Why Did Elon Musk Just Offer to Buy Control of OpenAI for $100 Billion?” by garrison

2025-02-1111:41

“The ‘Think It Faster’ Exercise” by Raemon

2025-02-0921:25

“So You Want To Make Marginal Progress...” by johnswentworth

2025-02-0807:10

“What is malevolence? On the nature, measurement, and distribution of dark traits” by David Althaus

2025-02-0801:20:43

“How AI Takeover Might Happen in 2 Years” by joshc

2025-02-0801:01:32

“Gradual Disempowerment, Shell Games and Flinches” by Jan_Kulveit

2025-02-0510:49

“Gradual Disempowerment: Systemic Existential Risks from Incremental AI Development” by Jan_Kulveit, Raymond D, Nora_Ammann, Deger Turan, David Scott Krueger (formerly: capybaralet), David Duvenaud

2025-02-0403:38

“Planning for Extreme AI Risks” by joshc

2025-02-0342:07

“Catastrophe through Chaos” by Marius Hobbhahn

2025-02-0323:39

“Will alignment-faking Claude accept a deal to reveal its misalignment?” by ryan_greenblatt

2025-02-0143:18

“‘Sharp Left Turn’ discourse: An opinionated review” by Steven Byrnes

2025-01-3001:01:13

“Ten people on the inside” by Buck

2025-01-2907:06

“Anomalous Tokens in DeepSeek-V3 and r1” by henry

2025-01-2818:37

“Tell me about yourself:LLMs are aware of their implicit behaviors” by Martín Soto, Owain_Evans

2025-01-2814:06

“Instrumental Goals Are A Different And Friendlier Kind Of Thing Than Terminal Goals” by johnswentworth, David Lorell

2025-01-2709:53

“A Three-Layer Model of LLM Psychology” by Jan_Kulveit

2025-01-2618:04

“Training on Documents About Reward Hacking Induces Reward Hacking” by evhub

2025-01-2404:47

“AI companies are unlikely to make high-assurance safety cases if timelines are short” by ryan_greenblatt

2025-01-2424:33

“Mechanisms too simple for humans to design” by Malmesbury

2025-01-2428:36

00:00

“o1 is a bad idea” by abramdemski

#box-pro-ellipsis-173971611630455{-webkit-line-clamp:2;}“o1 is a bad idea” by abramdemski

“o1 is a bad idea” by abramdemski

“o1 is a bad idea” by abramdemski