LessWrong (30+ Karma)

“Mech interp is not pre-paradigmatic” by Lee Sharkey

30 min • 10 juni 2025

This is a blogpost version of a talk I gave earlier this year at GDM.

Epistemic status: Vague and handwavy. Nuance is often missing. Some of the claims depend on implicit definitions that may be reasonable to disagree with. But overall I think it's directionally true.

It's often said that mech interp is pre-paradigmatic.

I think it's worth being skeptical of this claim.

In this post I argue that:

  • Mech interp is not pre-paradigmatic.
  • Within that paradigm, there have been "waves" (mini paradigms). Two waves so far.
  • Second-Wave Mech Interp has recently entered a 'crisis' phase.
  • We may be on the edge of a third wave.

Preamble: Kuhn, paradigms, and paradigm shifts

First, we need to be familiar with the basic definition of a paradigm:

A paradigm is a distinct set of concepts or thought patterns, including theories, research [...]

---

Outline:

(00:58) Preamble: Kuhn, paradigms, and paradigm shifts

(03:56) Claim: Mech Interp is Not Pre-paradigmatic

(07:56) First-Wave Mech Interp (ca. 2012 - 2021)

(10:21) The Crisis in First-Wave Mech Interp

(11:21) Second-Wave Mech Interp (ca. 2022 - ??)

(14:23) Anomalies in Second-Wave Mech Interp

(17:10) The Crisis of Second-Wave Mech Interp (ca. 2025 - ??)

(18:25) Toward Third-Wave Mechanistic Interpretability

(20:28) The Basics of Parameter Decomposition

(22:40) Parameter Decomposition Questions Foundational Assumptions of Second-Wave Mech Interp

(24:13) Parameter Decomposition In Theory Resolves Anomalies of Second-Wave Mech Interp

(27:27) Conclusion

The original text contained 6 footnotes which were omitted from this narration.

---

First published:
June 10th, 2025

Source:
https://www.lesswrong.com/posts/beREnXhBnzxbJtr8k/mech-interp-is-not-pre-paradigmatic

---

Narrated by TYPE III AUDIO.

---

Images from the article:

Presentation slide titled
Technical diagram titled
Table comparing
Academic slide titled
A presentation slide titled
Academic slide discussing pre-paradigm phase of mechanistic interpretation in neuroscience, with diagrams.

The slide includes three scientific figures: a hierarchical neural connectivity diagram from Hubel and Wiesel (1958), a complex network visualization from Rousselet (2004), and neural network weights visualization from Rumelhart (1986). The main text outlines concepts, methods, and standards from computational neuroscience and connectionism.
Concept map showing
Diagram showing
Slide showing parameter decomposition concept with neural network diagrams and matrix visualization.

The image illustrates parameter decomposition in neural networks, showing a process of flattening network weights into a parameter vector, then decomposing it into simpler components. The diagram includes matrix representations and simplified network structures to demonstrate how the decomposition works.

The left side lists key concepts including

Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

Senaste avsnitt

Podcastbild

00:00 -00:00
00:00 -00:00