Start / LessWrong (30+ Karma) / Opus 4 1 is an incremental improvement by zvi

“Opus 4.1 Is An Incremental Improvement” by Zvi

14 min • 7 augusti 2025

Claude Opus 4 has been updated to Claude Opus 4.1.

This is a correctly named incremental update, with the bigger news being ‘we plan to release substantially larger improvements to our models in the coming weeks.’

It is still worth noting if you code, as there are many indications this is a larger practical jump in performance than one might think.

We also got a change to the Claude.ai system prompt that helps with sycophancy and a few other issues, such as coming out and Saying The Thing more readily. It's going to be tricky to disentangle these changes, but that means Claude effectively got better for everyone, not only those doing agentic coding.

Tomorrow we get an OpenAI livestream that is presumably GPT-5, so I’m getting this out of the way now. Current plan is to cover GPT-OSS on Friday, and GPT-5 on Monday.

[...]

---

Outline:

(01:01) Introducing Claude Opus 4.1

(05:25) The System Card

(09:56) Reactions

---

First published:
August 6th, 2025

Source:
https://www.lesswrong.com/posts/hicuZJQwRYCiFCZbq/opus-4-1-is-an-incremental-improvement

---

Narrated by TYPE III AUDIO.

---

Images from the article:

Table comparing harmless response rates between Claude Opus 4.1 and 4 models.

Radar chart comparing Claude 4 and 4.1 Opus Agent capabilities across 8 metrics.

A humorous comparison showing the different behaviors of AI models Claude and GPT-4, depicted through simple stick figures. The top shows Claude figures standing calmly on a platform, while the bottom shows increasingly chaotic GPT versions performing acrobatics and stunts, ending with one on a burning car.

Table showing refusal rates for Claude Opus 4.1 and 4 models

Performance comparison table showing benchmarks for Claude and other AI models.

The table compares metrics across different versions of Claude (Opus 4.1, Opus 4, Sonnet 4), OpenAI o3, and Gemini 2.5 Pro on tasks like coding, reasoning, and multilingual capabilities, with detailed percentage scores for each category.

Bar graph comparing Software Engineering scores for Opus 4 and 4.1

Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

Senaste avsnitt

“Breaking the Cycle of Trauma and Tyranny: How Psychological Wounds Shape History” by Dawn Drescher

11 augusti | 25 min

“My Least Libertarian Opinion: Ban Exclusivity Deals*” by Brendan Long

11 augusti | 4 min

“Having children is a deeply personal choice. Do not use ethical arguments to try to shame people into having them or not having them.” by KatWoods

11 augusti | 4 min

“A Self-Dialogue on The Value Proposition of Romantic Relationships” by johnswentworth

10 augusti | 14 min

“4 places where you can put LLM monitoring” by Fabien Roger, Buck

10 augusti | 15 min

“OpenAI’s GPT-OSS Is Already Old News” by Zvi

9 augusti | 41 min

“The Tortoise and the Language Model (A Fable After Hofstadter)” by mwatkins

9 augusti | 8 min

“Extract-and-Evaluate Monitoring Can Significantly Enhance CoT Monitoring Performance (Research Note)” by Rauno Arike, RohanS, Shubhorup Biswas

9 augusti | 20 min

“What would a human pretending to be an AI say?” by Brendan Long

9 augusti | 2 min

“How anticipatory cover-ups go wrong” by Kaj_Sotala

8 augusti | 11 min

“METR’s Evaluation of GPT-5” by GradientDissenter

7 augusti | 48 min

“Civil Service: a Victim or a Villain?” by Martin Sustrik

7 augusti | 8 min

“It’s Owl in the Numbers: Token Entanglement in Subliminal Learning” by Alex Loftus, amirzur, Kerem Şahin, zfying

7 augusti | 11 min

“No, Rationalism Is Not a Cult” by Liam Robins

7 augusti | 20 min

“Interview with Kelsey Piper on Self-Censorship and the Vibe Shift” by Zack_M_Davis

7 augusti | 26 min

“Claude, GPT, and Gemini All Struggle to Evade Monitors” by Vincent Cheng, Thomas Kwa

7 augusti | 12 min