Start / LessWrong (30+ Karma) / Statistical takes for mech interp research and beyond by paul bogdan

“Statistical takes for mech interp research and beyond” by Paul Bogdan

31 min • 6 augusti 2025

I am currently a MATS 8.0 scholar studying mechanistic interpretability with Neel Nanda. I’m also a postdoc in psychology/neuroscience. My perhaps most notable paper analyzed the last 20 years of psychology research, searching for trends in what papers do and do not replicate. I have some takes on statistics.

tl;dr

Small p-values are nice.
Unless they're suspiciously small.

Statistical assumptions can be bent.
Except for independence.

Practical significance beats statistical significance.
Although practicality depends on context.

The measure is not the confound.
Sometimes it's close enough.

Readability often beats rigor.
But fearing rigor means you probably need it.

Simple is better than complex.
Complex is better than wrong.

Complex wrongs are the worst.
But permutation tests can help reveal them.

This post offers advice for frequentist and classifier-based analysis. I try to focus on points that are practical, diverse, and non-obvious. I emphasize relevancy for mechanistic interpretability research, but [...]

---

Outline:

(00:32) tl;dr

(01:36) 1. If you want to make a claim based on p-values, don't settle for p = .02

(02:05) 1.1. Some history and culture

(03:16) 1.2. For all practical purposes...

(05:51) 2. Independence among observations is critical

(06:33) 2.1. Data is often hierarchical

(07:22) 2.2. Hierarchical interdependencies are large

(10:45) 2.3. Categories allow independence to be easily achieved

(11:50) 2.4. With continuous variables, things are hairier

(14:34) 3. Effect sizes are only meaningful in context

(16:02) 3.1. Contextualizing the quip

(18:14) 3.2. Back to mech interp

(19:04) 4. Perfectly controlling confounds is hard

(21:46) 4.1. Challenging confounds

(23:14) 4.2. Bright possibilities

(24:30) 5. Simplicity is valuable, and you can usually ignore statistical tests' assumptions

(25:36) 5.1. Readability matters

(26:10) 5.2. Slight complexity that I still encourage

(27:17) 6. Complexity breeds statistical demons, but careful permutation testing can slay them

(30:34) 7. Conclusion

The original text contained 14 footnotes which were omitted from this narration.

---

First published:
August 6th, 2025

Source:
https://www.lesswrong.com/posts/GxhtzqMwdTHo6326y/statistical-takes-for-mech-interp-research-and-beyond

---

Narrated by TYPE III AUDIO.

Senaste avsnitt

“CoT May Be Highly Informative Despite ‘Unfaithfulness’ [METR]” by GradientDissenter

12 augusti | 67 min

“Statistical takes for mech interp research and beyond” by Paul Bogdan

tl;dr

Senaste avsnitt

“CoT May Be Highly Informative Despite ‘Unfaithfulness’ [METR]” by GradientDissenter

“Measuring intelligence and reverse-engineering goals” by jessicata

“The trajectory of the future could soon get set in stone” by wdmacaskill

[Linkpost] “Thoughts on extrapolating time horizons” by Nikola Jurkovic

“How Does A Blind Model See The Earth?” by henry

“If worker coops are so productive, why aren’t they everywhere?” by B Jacobs

“GPT-5s Are Alive: Basic Facts, Benchmarks and the Model Card” by Zvi

“Breaking the Cycle of Trauma and Tyranny: How Psychological Wounds Shape History” by Dawn Drescher

“My Least Libertarian Opinion: Ban Exclusivity Deals*” by Brendan Long

“Having children is a deeply personal choice. Do not use ethical arguments to try to shame people into having them or not having them.” by KatWoods

“A Self-Dialogue on The Value Proposition of Romantic Relationships” by johnswentworth

“4 places where you can put LLM monitoring” by Fabien Roger, Buck

“OpenAI’s GPT-OSS Is Already Old News” by Zvi

“The Tortoise and the Language Model (A Fable After Hofstadter)” by mwatkins

“Extract-and-Evaluate Monitoring Can Significantly Enhance CoT Monitoring Performance (Research Note)” by Rauno Arike, RohanS, Shubhorup Biswas

“What would a human pretending to be an AI say?” by Brendan Long

“How anticipatory cover-ups go wrong” by Kaj_Sotala

“METR’s Evaluation of GPT-5” by GradientDissenter

“Civil Service: a Victim or a Villain?” by Martin Sustrik

“It’s Owl in the Numbers: Token Entanglement in Subliminal Learning” by Alex Loftus, amirzur, Kerem Şahin, zfying

“No, Rationalism Is Not a Cult” by Liam Robins

“Interview with Kelsey Piper on Self-Censorship and the Vibe Shift” by Zack_M_Davis

“Claude, GPT, and Gemini All Struggle to Evade Monitors” by Vincent Cheng, Thomas Kwa

“Opus 4.1 Is An Incremental Improvement” by Zvi

“Re: Recent Anthropic Safety Research” by Eliezer Yudkowsky

“Inscrutability was always inevitable, right?” by Steven Byrnes

“Statistical takes for mech interp research and beyond” by Paul Bogdan

[Linkpost] “OpenAI Releases gpt-oss” by anaguma

“Childhood and Education #13: College” by Zvi

“The perils of under- vs over-sculpting AGI desires” by Steven Byrnes

“The Problem” by Rob Bensinger, tanagrabeast, yams, So8res, Eliezer Yudkowsky, Gretta Duleba

“Concept Poisoning: Probing LLMs without probes” by Jan Betley, jorio, dylan_f, Owain_Evans

“Narrow finetuning is different” by cloud, Stewy Slocum

“On Altman’s Interview With Theo Von” by Zvi

“Interview with Steven Byrnes on Brain-like AGI, Foom & Doom, and Solving Technical Alignment” by Liron, Steven Byrnes

“Towards Alignment Auditing as a Numbers-Go-Up Science” by Sam Marks

“Alcohol is so bad for society that you should probably stop drinking” by KatWoods

“Permanent Disempowerment is the Baseline” by Vladimir_Nesov

“Should we aim for flourishing over mere survival? The Better Futures series.” by wdmacaskill

“Saying Goodbye” by sapphire

“Emotions Make Sense” by DaystarEld

“Whence the Inkhaven Residency?” by Ben Pace

“Many prediction markets would be better off as batched auctions” by William Howard

“How many species has humanity driven extinct?” by Raemon

“SB-1047 Documentary: The Post-Mortem” by Michaël Trazzi

“Podcast: Lincoln Quirk from Wave” by Elizabeth

“The Dark Arts As A Scaffolding Skill For Rationality” by Screwtape

“Steve Petersen funding” by abramdemski

“Two Kinds of Do Overs” by jefftk

“Red-Thing-Ism” by J Bostock

“Do Not Render Your Counterfactuals” by AlphaAndOmega

“Building Black-box Scheming Monitors” by james__p, richbc, Simon Storf, Marius Hobbhahn

“Follow-up to ‘My Empathy Is Rarely Kind’” by johnswentworth

“I am worried about near-term non-LLM AI developments” by testingthewaters

“Childhood and Education: College Admissions” by Zvi

“Optimizing The Final Output Can Obfuscate CoT (Research Note)” by lukemarks, jacob_drori, cloud, TurnTrout

“China proposes new global AI cooperation organisation” by Matrice Jacobine

“My Empathy Is Rarely Kind” by johnswentworth

“The many paths to permanent disempowerment even with shutdownable AIs (MATS project summary for feedback)” by GideonF

“Spilling the Tea” by Zvi

“I wrote a song parody” by CronoDAS

“Low P(x-risk) as the Bailey for Low P(doom)” by Vladimir_Nesov

“About 30% of Humanity’s Last Exam chemistry/biology answers are likely wrong” by bohaska

“Procrastination Drill” by silentbob

“Teaching kids to swim” by Steven Byrnes

“Recursions on LessOnline 2025” by Error

“Simplex Progress Report - July 2025” by Adam Shai, Paul Riechers, hrbigelow, Eric Alt, mntss

“Optimally Combining Probe Monitors and Black Box Monitors” by Tim Hua, jamesbaskerville, BionicD0LPH1N, Mia Hopman, Aryan Bhatt, Tyler Tracy

“AI Companion Piece” by Zvi

“This Is Not Life” by samhealy

“Sydney Bing Wikipedia Article: Sydney (Microsoft Prometheus)” by jdp

“Maya’s Escape” by Bridgett Kay

[Linkpost] “The Purpose of a System is what it Rewards” by robotelvis

“my experience on glp-1s as a thin person” by AnnaJo

“Anthropic Faces Potentially ‘Business-Ending’ Copyright Lawsuit” by garrison

“HPMOR: The (Probably) Untold Lore” by Gretta Duleba, Eliezer Yudkowsky

“We Built a Tool to Protect Your Dataset From Simple Scrapers” by TurnTrout, Edward Turner, Dipika Khullar