LessWrong (30+ Karma)

“Glass box learners want to be black box” by Cole Wyeth

10 min • 11 maj 2025

Epistemic status: intuitive speculation with scattered mathematical justification.

My goal here is to interrogate the dream of writing a beautiful algorithm for intelligence and thereby ensuring safety. For example:

I don't know precisely what alternative he had in mind, but I only seem to remember reading clean functional programs from MIRI, so that is one possibility. Whether or not anyone else endorses it, that is the prototypical picture I have in mind as the holy grail of glass box AGI implementation. In my mind, it has careful comments explaining the precisely chosen, recursive prior and decision rules that make it go "foom."

Is the "inscrutable" complexity of deep neural networks unavoidable? There is has been some prior discussion of the desire to avoid it as map-territory confusion, and I am not sure if that is true (though I have some fairly subtle suspicions). However, I want to [...]

---

Outline:

(03:11) Optimality as a boundary

(04:39) Approaching the boundary

(07:02) Implications for safety

The original text contained 6 footnotes which were omitted from this narration.

---

First published:
May 10th, 2025

Source:
https://www.lesswrong.com/posts/boodbr2PXpEEMGrfx/glass-box-learners-want-to-be-black-box

---

Narrated by TYPE III AUDIO.

---

Images from the article:

Simple stick figure sketch showing a tic-tac-toe game and spider-like drawings.
Simple stick figure drawing with gears and a tic-tac-toe board pattern.
Abstract line drawing with stick figures, gears, and geometric shapes.
Abstract line drawing with gears, tic-tac-toe grid, and stick figures.
Static-filled rectangle with simple stick figure doodles around the edges.
Eliezer Yudkowsky tweets:

Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

Senaste avsnitt

Podcastbild

00:00 -00:00
00:00 -00:00