Start / LessWrong (30+ Karma) / Interview with steven byrnes on brain like agi foom doom and solving technical alignment by liron steven byrnes

“Interview with Steven Byrnes on Brain-like AGI, Foom & Doom, and Solving Technical Alignment” by Liron, Steven Byrnes

154 min • 5 augusti 2025

Dr. @Steven Byrnes is one of the few people who both understands why alignment is hard, and is taking a serious technical shot at solving it. He's the author of these recently popular posts:

Foom & Doom 1: "Brain in a box in a basement"
Foom & Doom 2: Technical alignment is hard

After his UC Berkeley physics PhD & Harvard postdoc, he became an AGI safety researcher at Astera. He's now deep in the neuroscience of reverse-engineering how human brains actually work, knowledge that could plausibly help us solve the technical AI alignment problem.

He has a whopping 90% P(Doom), but argues that LLMs will plateau before becoming truly dangerous, and the real threat will come from next-generation “brain-like AGI” based on actor-critic reinforcement learning.

We cover Steve's "two subsystems" model of the brain, why current AI safety approaches miss the mark, Steve's disagreements with "social evolution" [...]

---

Outline:

(01:18) Video

(01:24) Podcast

(01:44) Transcript

(01:47) Cold Open

(02:13) Introducing Steven Byrnes

(09:10) Path to Neuroscience and AGI Safety

(18:53) Research Direction and Brain-like AGI

(23:47) The Two Brain Subsystems

(45:28) Language Acquisition and Learning

(50:19) LLM Limitations

(01:10:10) Brain-like AGI

(01:16:04) Actor-Critic Reinforcement Learning

(01:41:10) Alignment Solutions and Reward Functions

(01:48:31) Actor-Critic Model and Brain Architecture

(02:00:33) Current AI vs Future Paradigms

(02:06:39) LLM Limitations and Capabilities

(02:13:24) Inner vs Outer Alignment

(02:19:28) AI Policy and Pause AI Discussion

(02:25:49) Lightning Round

(02:32:19) Closing Thoughts

---

First published:
August 5th, 2025

Source:
https://www.lesswrong.com/posts/zecxwyATrN8ZbinoC/interview-with-steven-byrnes-on-brain-like-agi-foom-and-doom

---

Narrated by TYPE III AUDIO.

---

Images from the article:

Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

Senaste avsnitt

“OpenAI’s GPT-OSS Is Already Old News” by Zvi

9 augusti | 41 min

“The Tortoise and the Language Model (A Fable After Hofstadter)” by mwatkins

9 augusti | 8 min

“Extract-and-Evaluate Monitoring Can Significantly Enhance CoT Monitoring Performance (Research Note)” by Rauno Arike, RohanS, Shubhorup Biswas

9 augusti | 20 min

“What would a human pretending to be an AI say?” by Brendan Long

9 augusti | 2 min

“How anticipatory cover-ups go wrong” by Kaj_Sotala

8 augusti | 11 min

“Interview with Steven Byrnes on Brain-like AGI, Foom & Doom, and Solving Technical Alignment” by Liron, Steven Byrnes

Senaste avsnitt

“OpenAI’s GPT-OSS Is Already Old News” by Zvi

“The Tortoise and the Language Model (A Fable After Hofstadter)” by mwatkins

“Extract-and-Evaluate Monitoring Can Significantly Enhance CoT Monitoring Performance (Research Note)” by Rauno Arike, RohanS, Shubhorup Biswas

“What would a human pretending to be an AI say?” by Brendan Long

“How anticipatory cover-ups go wrong” by Kaj_Sotala

“METR’s Evaluation of GPT-5” by GradientDissenter

“Civil Service: a Victim or a Villain?” by Martin Sustrik

“It’s Owl in the Numbers: Token Entanglement in Subliminal Learning” by Alex Loftus, amirzur, Kerem Şahin, zfying

“No, Rationalism Is Not a Cult” by Liam Robins

“Interview with Kelsey Piper on Self-Censorship and the Vibe Shift” by Zack_M_Davis

“Claude, GPT, and Gemini All Struggle to Evade Monitors” by Vincent Cheng, Thomas Kwa

“Opus 4.1 Is An Incremental Improvement” by Zvi

“Re: Recent Anthropic Safety Research” by Eliezer Yudkowsky

“Inscrutability was always inevitable, right?” by Steven Byrnes

“Statistical takes for mech interp research and beyond” by Paul Bogdan

[Linkpost] “OpenAI Releases gpt-oss” by anaguma

“Childhood and Education #13: College” by Zvi

“The perils of under- vs over-sculpting AGI desires” by Steven Byrnes

“The Problem” by Rob Bensinger, tanagrabeast, yams, So8res, Eliezer Yudkowsky, Gretta Duleba

“Concept Poisoning: Probing LLMs without probes” by Jan Betley, jorio, dylan_f, Owain_Evans

“Narrow finetuning is different” by cloud, Stewy Slocum

“On Altman’s Interview With Theo Von” by Zvi

“Interview with Steven Byrnes on Brain-like AGI, Foom & Doom, and Solving Technical Alignment” by Liron, Steven Byrnes

“Towards Alignment Auditing as a Numbers-Go-Up Science” by Sam Marks

“Alcohol is so bad for society that you should probably stop drinking” by KatWoods

“Permanent Disempowerment is the Baseline” by Vladimir_Nesov

“Should we aim for flourishing over mere survival? The Better Futures series.” by wdmacaskill

“Saying Goodbye” by sapphire

“Emotions Make Sense” by DaystarEld

“Whence the Inkhaven Residency?” by Ben Pace

“Many prediction markets would be better off as batched auctions” by William Howard

“How many species has humanity driven extinct?” by Raemon

“SB-1047 Documentary: The Post-Mortem” by Michaël Trazzi

“Podcast: Lincoln Quirk from Wave” by Elizabeth

“The Dark Arts As A Scaffolding Skill For Rationality” by Screwtape

“Steve Petersen funding” by abramdemski

“Two Kinds of Do Overs” by jefftk

“Red-Thing-Ism” by J Bostock

“Do Not Render Your Counterfactuals” by AlphaAndOmega

“Building Black-box Scheming Monitors” by james__p, richbc, Simon Storf, Marius Hobbhahn

“Follow-up to ‘My Empathy Is Rarely Kind’” by johnswentworth

“I am worried about near-term non-LLM AI developments” by testingthewaters

“Childhood and Education: College Admissions” by Zvi

“Optimizing The Final Output Can Obfuscate CoT (Research Note)” by lukemarks, jacob_drori, cloud, TurnTrout

“China proposes new global AI cooperation organisation” by Matrice Jacobine

“My Empathy Is Rarely Kind” by johnswentworth

“The many paths to permanent disempowerment even with shutdownable AIs (MATS project summary for feedback)” by GideonF

“Spilling the Tea” by Zvi

“I wrote a song parody” by CronoDAS

“Low P(x-risk) as the Bailey for Low P(doom)” by Vladimir_Nesov

“About 30% of Humanity’s Last Exam chemistry/biology answers are likely wrong” by bohaska

“Procrastination Drill” by silentbob

“Teaching kids to swim” by Steven Byrnes

“Recursions on LessOnline 2025” by Error

“Simplex Progress Report - July 2025” by Adam Shai, Paul Riechers, hrbigelow, Eric Alt, mntss

“Optimally Combining Probe Monitors and Black Box Monitors” by Tim Hua, jamesbaskerville, BionicD0LPH1N, Mia Hopman, Aryan Bhatt, Tyler Tracy

“AI Companion Piece” by Zvi

“This Is Not Life” by samhealy

“Sydney Bing Wikipedia Article: Sydney (Microsoft Prometheus)” by jdp

“Maya’s Escape” by Bridgett Kay

[Linkpost] “The Purpose of a System is what it Rewards” by robotelvis

“my experience on glp-1s as a thin person” by AnnaJo

“Anthropic Faces Potentially ‘Business-Ending’ Copyright Lawsuit” by garrison

“HPMOR: The (Probably) Untold Lore” by Gretta Duleba, Eliezer Yudkowsky

“We Built a Tool to Protect Your Dataset From Simple Scrapers” by TurnTrout, Edward Turner, Dipika Khullar

[Linkpost] “Reasoning-Finetuning Repurposes Latent Representations in Base Models” by Jake Ward, lccqqqqq, Neel Nanda

“Building and evaluating alignment auditing agents” by Sam Marks, Sam Bowman, Euan Ong, Johannes Treutlein, evhub

“The Whole Check” by JustisMills

“‘Behaviorist’ RL reward functions lead to scheming” by Steven Byrnes

[Linkpost] “A brief perspective from an IMO coordinator” by DirectedEvolution

“Steering Out-of-Distribution Generalization with Concept Ablation Fine-Tuning” by kh4dien, Helena Casademunt, Adam Karvonen, Sam Marks, Senthooran Rajamanoharan, Neel Nanda

“On ‘ChatGPT Psychosis’ and LLM Sycophancy” by jdp

“Google and OpenAI Get 2025 IMO Gold” by Zvi

“Unfaithful chain-of-thought as nudged reasoning” by Paul Bogdan, Uzay Macar, Arthur Conmy, Neel Nanda

“Subliminal Learning: LLMs Transmit Behavioral Traits via Hidden Signals in Data” by cloud, mle, Owain_Evans

“Directly Try Solving Alignment for 5 weeks” by Kabir Kumar

[Linkpost] “Why Reality Has A Well-Known Math Bias” by Linch

“Do ‘adult developmental stages’ theories have any pre-theoretical motivation?” by Said Achmiz