Epistemic Status: Over years of reading alignment plans and studying agent foundations, this is my first serious attempt to formulate an alignment research program that I (Cole Wyeth) have not been able to find any critical flaws in. It is far from a complete solution, but I think it is a meaningful decomposition of the problem into modular pieces that can be addressed by technical means - that is, it seems to solve many of the philosophical barriers to AI alignment. I have attempted to make the necessary assumptions clear throughout. The main reason that I am excited about this plan is that the assumptions seem acceptable to both agent foundations researchers and ML engineers; that is, I do not believe there are any naive assumptions about the nature of intelligence OR any computationally intractable obstacles to implementation. This program (tentatively ARAD := Adversarially Robust Augmentation and Distillation) owes [...]
---
Outline:
(04:31) High-level Implementation
(06:03) Technical Justification
(17:45) Implementation Details
The original text contained 1 footnote which was omitted from this narration.
---
First published:
May 25th, 2025
---
Narrated by TYPE III AUDIO.
En liten tjänst av I'm With Friends. Finns även på engelska.