Ed and Anna are co-first authors on this work.
Emergent Misalignment found that fine-tuning models on narrowly misaligned data, such as insecure code [...]
---
Outline:
(00:16) TL;DR
(01:19) Introduction
(03:25) Coherent Emergent Misalignment
(07:02) EM with 0.5B Parameters
(08:11) EM with a Full Supervised Finetune
(09:13) EM with a Single Rank 1 LoRA Adapter
(10:01) Future Work
(11:05) Contributions
(11:33) Acknowledgments
The original text contained 6 footnotes which were omitted from this narration.
---
First published:
June 16th, 2025
Source:
https://www.lesswrong.com/posts/yHmJrDSJpFaNTZ9Tr/model-organisms-for-emergent-misalignment
---
Narrated by TYPE III AUDIO.
---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
En liten tjänst av I'm With Friends. Finns även på engelska.