This is a blogpost version of a talk I gave earlier this year at GDM.
Epistemic status: Vague and handwavy. Nuance is often missing. Some of the claims depend on implicit definitions that may be reasonable to disagree with. But overall I think it's directionally true.
It's often said that mech interp is pre-paradigmatic.
I think it's worth being skeptical of this claim.
In this post I argue that:
Preamble: Kuhn, paradigms, and paradigm shifts
First, we need to be familiar with the basic definition of a paradigm:
A paradigm is a distinct set of concepts or thought patterns, including theories, research [...]
---
Outline:
(00:58) Preamble: Kuhn, paradigms, and paradigm shifts
(03:56) Claim: Mech Interp is Not Pre-paradigmatic
(07:56) First-Wave Mech Interp (ca. 2012 - 2021)
(10:21) The Crisis in First-Wave Mech Interp
(11:21) Second-Wave Mech Interp (ca. 2022 - ??)
(14:23) Anomalies in Second-Wave Mech Interp
(17:10) The Crisis of Second-Wave Mech Interp (ca. 2025 - ??)
(18:25) Toward Third-Wave Mechanistic Interpretability
(20:28) The Basics of Parameter Decomposition
(22:40) Parameter Decomposition Questions Foundational Assumptions of Second-Wave Mech Interp
(24:13) Parameter Decomposition In Theory Resolves Anomalies of Second-Wave Mech Interp
(27:27) Conclusion
The original text contained 6 footnotes which were omitted from this narration.
---
First published:
June 10th, 2025
Source:
https://www.lesswrong.com/posts/beREnXhBnzxbJtr8k/mech-interp-is-not-pre-paradigmatic
---
Narrated by TYPE III AUDIO.
---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
En liten tjänst av I'm With Friends. Finns även på engelska.