DiscoverLessWrong (30+ Karma)“Claude Sonnet 4.5 Is A Very Good Model” by Zvi
“Claude Sonnet 4.5 Is A Very Good Model” by Zvi

“Claude Sonnet 4.5 Is A Very Good Model” by Zvi

Update: 2025-10-02
Share

Description

A few weeks ago, Anthropic announced Claude Opus 4.1 and promised larger announcements within a few weeks. Claude Sonnet 4.5 is the larger announcement.


Yesterday I covered the model card and related alignment concerns.


Today's post covers the capabilities side.


We don’t currently have a new Opus, but Mike Krieger confirmed one is being worked on for release later this year. For Opus 4.5, my request is to give us a second version that gets minimal or no RL, isn’t great at coding, doesn’t use tools well except web search, doesn’t work as an agent or for computer use and so on, and if you ask it for those things it suggests you hand your task off to its technical friend or does so on your behalf.









I do my best to include all substantive reactions I’ve seen, positive and negative, because right after model [...]

---

Outline:

(01:14 ) Big Talk

(02:53 ) The Big Takeaways

(04:55 ) On Your Marks

(09:25 ) Huh, Upgrades

(13:08 ) The System Prompt

(20:31 ) Positive Reactions Curated By Anthropic

(23:13 ) Other Systematic Positive Reactions

(27:24 ) Anecdotal Positive Reactions

(32:02 ) Anecdotal Negative Reactions

(40:57 ) Claude Enters Its Non-Sycophantic Era

(42:28 ) So Emotional

(48:25 ) Early Days

---


First published:

October 1st, 2025



Source:

https://www.lesswrong.com/posts/spQh5JfWXqTE5x5Wi/claude-sonnet-4-5-is-a-very-good-model


---


Narrated by TYPE III AUDIO.


---

Images from the article:

Development notes explaining technical debt and testing decisions for asynchronous code.
Bar graph comparing software engineering accuracy scores of different AI models (SWE-bench).
Bar graph
A scatter plot titled
Claude piano in library with stained glass windows, floating musical notes
Performance comparison table showing benchmarks for different AI models (Claude, GPT-5, Gemini).  The table compares various metrics including coding abilities, reasoning tasks, and specialized benchmarks, with Claude 4.5 showing strong performance across most categories, particularly in telecom tool use (98%) and high school math (100%).

Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

Comments 
loading
In Channel
loading
00:00
00:00
1.0x

0.5x

0.8x

1.0x

1.25x

1.5x

2.0x

3.0x

Sleep Timer

Off

End of Episode

5 Minutes

10 Minutes

15 Minutes

30 Minutes

45 Minutes

60 Minutes

120 Minutes

“Claude Sonnet 4.5 Is A Very Good Model” by Zvi

“Claude Sonnet 4.5 Is A Very Good Model” by Zvi