Anna and Ed are co-first authors for this work. We’re presenting these results as a research update for a continuing body of work, which we hope will be interesting and useful for others working on related topics.
---
Outline:
(00:27) TL;DR
(02:03) Introduction
(04:03) Training a Narrowly Misaligned Model
(07:13) Measuring Stability and Efficiency
(10:00) Conclusion
The original text contained 7 footnotes which were omitted from this narration.
---
First published:
July 14th, 2025
---
Narrated by TYPE III AUDIO.
---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
En liten tjänst av I'm With Friends. Finns även på engelska.