This piece is based on work conducted during MATS 8.0 and is part of a broader aim of interpreting chain-of-thought in reasoning models.
---
Outline:
(00:21) tl;dr
(01:54) Unfaithfulness
(04:16) CoT is functional, and faithfulness lacks benefits
(09:29) Hidden information can nudge CoTs
(09:48) Silent, soft, and steady spectres
(13:48) Nudges are plausible
(14:42) Nudged CoTs are hard to spot
(15:49) Safety and CoT monitoring
(18:11) Final summary
The original text contained 5 footnotes which were omitted from this narration.
---
First published:
July 22nd, 2025
Source:
https://www.lesswrong.com/posts/vPAFPpRDEg3vjhNFi/unfaithful-chain-of-thought-as-nudged-reasoning
---
Narrated by TYPE III AUDIO.
---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
En liten tjänst av I'm With Friends. Finns även på engelska.