Sveriges mest populära poddar

LessWrong (30+ Karma)

“Opus 4.1 Is An Incremental Improvement” by Zvi

14 min • 7 augusti 2025

Claude Opus 4 has been updated to Claude Opus 4.1.

This is a correctly named incremental update, with the bigger news being ‘we plan to release substantially larger improvements to our models in the coming weeks.’

It is still worth noting if you code, as there are many indications this is a larger practical jump in performance than one might think.

We also got a change to the Claude.ai system prompt that helps with sycophancy and a few other issues, such as coming out and Saying The Thing more readily. It's going to be tricky to disentangle these changes, but that means Claude effectively got better for everyone, not only those doing agentic coding.

Tomorrow we get an OpenAI livestream that is presumably GPT-5, so I’m getting this out of the way now. Current plan is to cover GPT-OSS on Friday, and GPT-5 on Monday.

[...]

---

Outline:

(01:01) Introducing Claude Opus 4.1

(05:25) The System Card

(09:56) Reactions

---

First published:
August 6th, 2025

Source:
https://www.lesswrong.com/posts/hicuZJQwRYCiFCZbq/opus-4-1-is-an-incremental-improvement

---

Narrated by TYPE III AUDIO.

---

Images from the article:

Table comparing harmless response rates between Claude Opus 4.1 and 4 models.
Radar chart comparing Claude 4 and 4.1 Opus Agent capabilities across 8 metrics.
A humorous comparison showing the different behaviors of AI models Claude and GPT-4, depicted through simple stick figures. The top shows Claude figures standing calmly on a platform, while the bottom shows increasingly chaotic GPT versions performing acrobatics and stunts, ending with one on a burning car.
Table showing refusal rates for Claude Opus 4.1 and 4 models
Performance comparison table showing benchmarks for Claude and other AI models.

The table compares metrics across different versions of Claude (Opus 4.1, Opus 4, Sonnet 4), OpenAI o3, and Gemini 2.5 Pro on tasks like coding, reasoning, and multilingual capabilities, with detailed percentage scores for each category.
Bar graph comparing Software Engineering scores for Opus 4 and 4.1
Bar graph titled

Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

Senaste avsnitt

Podcastbild

00:00 -00:00
00:00 -00:00