Nick and Lily are co-first authors on this project. Lewis and Neel jointly supervised this project.
TL;DR
---
Outline:
(00:22) TL;DR
(01:48) Introduction
(04:41) Preliminaries
(06:09) Data Diffing
(07:16) Identifying known differences from datasets
(09:09) Discovering novel differences between model behavior
(14:26) Correlations
(16:21) Finding known correlations
(17:45) Finding unknown correlations
(17:58) Finding bias in internet comments
(19:52) Finding patterns in model responses
(20:51) Clustering
(22:39) Discovering known clusters
(24:26) Discovering unknown clusters
(26:13) Retrieval
(33:45) Discussion and Limitations
(35:06) Awknowledgments
The original text contained 6 footnotes which were omitted from this narration.
---
First published:
August 15th, 2025
Source:
https://www.lesswrong.com/posts/a4EDinzAYtRwpNmx9/towards-data-centric-interpretability-with-sparse
---
Narrated by TYPE III AUDIO.
---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
En liten tjänst av I'm With Friends. Finns även på engelska.