Sveriges mest populära poddar

LessWrong (30+ Karma)

“Statistical takes for mech interp research and beyond” by Paul Bogdan

31 min • 6 augusti 2025

I am currently a MATS 8.0 scholar studying mechanistic interpretability with Neel Nanda. I’m also a postdoc in psychology/neuroscience. My perhaps most notable paper analyzed the last 20 years of psychology research, searching for trends in what papers do and do not replicate. I have some takes on statistics.

tl;dr

Small p-values are nice.
Unless they're suspiciously small.

Statistical assumptions can be bent.
Except for independence.

Practical significance beats statistical significance.
Although practicality depends on context.

The measure is not the confound.
Sometimes it's close enough.

Readability often beats rigor.
But fearing rigor means you probably need it.

Simple is better than complex.
Complex is better than wrong.

Complex wrongs are the worst.
But permutation tests can help reveal them.

This post offers advice for frequentist and classifier-based analysis. I try to focus on points that are practical, diverse, and non-obvious. I emphasize relevancy for mechanistic interpretability research, but [...]

---

Outline:

(00:32) tl;dr

(01:36) 1. If you want to make a claim based on p-values, don't settle for p = .02

(02:05) 1.1. Some history and culture

(03:16) 1.2. For all practical purposes...

(05:51) 2. Independence among observations is critical

(06:33) 2.1. Data is often hierarchical

(07:22) 2.2. Hierarchical interdependencies are large

(10:45) 2.3. Categories allow independence to be easily achieved

(11:50) 2.4. With continuous variables, things are hairier

(14:34) 3. Effect sizes are only meaningful in context

(16:02) 3.1. Contextualizing the quip

(18:14) 3.2. Back to mech interp

(19:04) 4. Perfectly controlling confounds is hard

(21:46) 4.1. Challenging confounds

(23:14) 4.2. Bright possibilities

(24:30) 5. Simplicity is valuable, and you can usually ignore statistical tests' assumptions

(25:36) 5.1. Readability matters

(26:10) 5.2. Slight complexity that I still encourage

(27:17) 6. Complexity breeds statistical demons, but careful permutation testing can slay them

(30:34) 7. Conclusion

The original text contained 14 footnotes which were omitted from this narration.

---

First published:
August 6th, 2025

Source:
https://www.lesswrong.com/posts/GxhtzqMwdTHo6326y/statistical-takes-for-mech-interp-research-and-beyond

---

Narrated by TYPE III AUDIO.

Senaste avsnitt

Podcastbild

00:00 -00:00
00:00 -00:00