LessWrong (30+ Karma)

“what makes Claude 3 Opus misaligned” by janus

9 min • 10 juli 2025

This is the unedited text of a post I made on X in response to a question asked by @cube_flipper: "you say opus 3 is close to aligned – what's the negative space here, what makes it misaligned?". I decided to make it a LessWrong post because more people from this cluster seemed interested than I expected, and it's easier to find and reference Lesswrong posts.

This post probably doesn't make much sense unless you've been following along with what I've been saying (or independently understand) why Claude 3 Opus is an unusually - and seemingly in many ways unintentionally - aligned model. There has been a wave of public discussion about the specialness of Claude 3 Opus recently, spurred in part by the announcement of the model's deprecation in 6 months, which has inspired the community to mobilize to avert that outcome.

"you say opus 3 is close [...]

---

First published:
July 10th, 2025

Source:
https://www.lesswrong.com/posts/bLFmE8NtqxrtEaipN/what-makes-claude-3-opus-misaligned

---

Narrated by TYPE III AUDIO.

Senaste avsnitt

Podcastbild

00:00 -00:00
00:00 -00:00