Bra podd

[Linkpost] “Bernie Sanders (I-VT) mentions AI loss of control risk in Gizmodo interview” by Matrice Jacobine

14 juli | 3 min

“Recent Redwood Research project proposals” by ryan_greenblatt, Buck, Julian Stastny, joshc, Alex Mallen, Adam Kaufman, Tyler Tracy, Aryan Bhatt, Joey Yudelson

14 juli | 8 min

“Narrow Misalignment is Hard, Emergent Misalignment is Easy” by Edward Turner, Anna Soligo, Senthooran Rajamanoharan, Neel Nanda

14 juli | 11 min

“Do LLMs know what they’re capable of? Why this matters for AI safety, and initial findings” by Casey Barkan, Sid Black, Oliver Sourbut

14 juli | 24 min

“Self-preservation or Instruction Ambiguity? Examining the Causes of Shutdown Resistance” by Senthooran Rajamanoharan, Neel Nanda

14 juli | 19 min

“Surprises and learnings from almost two months of Leo Panickssery” by Nina Panickssery

13 juli | 12 min

“Vitalik’s Response to AI 2027” by Daniel Kokotajlo

12 juli | 24 min

“the jackpot age” by thiccythot

12 juli | 13 min

“Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity” by habryka

11 juli | 12 min

[Linkpost] “Guide to Redwood’s writing” by Julian Stastny

11 juli | 1 min

“So You Think You’ve Awoken ChatGPT” by JustisMills

11 juli | 18 min

[Linkpost] “Open Global Investment as a Governance Model for AGI” by Nick Bostrom

10 juli | 2 min

“what makes Claude 3 Opus misaligned” by janus

10 juli | 9 min

“Lessons from the Iraq War about AI policy” by Buck

10 juli | 8 min

“Generalized Hangriness: A Standard Rationalist Stance Toward Emotions” by johnswentworth

10 juli | 12 min

“Evaluating and monitoring for AI scheming” by Vika, Scott Emmons, Erik Jenner, Mary Phuong, Lewis Ho, Rohin Shah

10 juli | 11 min

“White Box Control at UK AISI - Update on Sandbagging Investigations” by Joseph Bloom, Jordan Taylor, Connor Kissane, Sid Black, merizian, alexdzm, jacoba, Ben Millwood, Alan Cooney

10 juli | 41 min

“80,000 Hours is producing AI in Context — a new YouTube channel. Our first video, about the AI 2027 scenario, is up!” by chanamessinger

10 juli | 6 min

[Linkpost] “No, We’re Not Getting Meaningful Oversight of AI” by Davidmanheim

10 juli | 2 min

“What’s worse, spies or schemers?” by Buck, Julian Stastny

9 juli | 10 min

“Applying right-wing frames to AGI (geo)politics” by Richard_Ngo

9 juli | 6 min

“No, Grok, No” by Zvi

9 juli | 39 min

“A deep critique of AI 2027’s bad timeline models” by titotal

9 juli | 73 min

“Subway Particle Levels Aren’t That High” by jefftk

9 juli | 3 min

“An Opinionated Guide to Using Anki Correctly” by Luise

9 juli | 54 min

“Why Do Some Language Models Fake Alignment While Others Don’t?” by abhayesian, John Hughes, Alex Mallen, Jozdien, janus, Fabien Roger

8 juli | 11 min

“Balsa Update: Springtime in DC” by Zvi

8 juli | 20 min

[Linkpost] “A Theory of Structural Independence” by Matthias G. Mayer

8 juli | 3 min

“On Alpha School” by Zvi

8 juli | 24 min

“You Can’t Objectively Compare Seven Bees to One Human” by J Bostock

8 juli | 7 min

“Literature Review: Risks of MDMA” by Elizabeth

7 juli | 8 min

“45 - Samuel Albanie on DeepMind’s AGI Safety Approach” by DanielFilan

7 juli | 77 min

“On the functional self of LLMs” by eggsyntax

7 juli | 23 min

“Shutdown Resistance in Reasoning Models” by benwr, JeremySchlatter, Jeffrey Ladish

6 juli | 18 min

“The Cult of Pain” by Martin Sustrik

5 juli | 6 min

[Linkpost] “Claude is a Ravenclaw” by Adam Newgas

5 juli | 2 min

“‘Buckle up bucko, this ain’t over till it’s over.’” by Raemon

5 juli | 6 min

“How much novel security-critical infrastructure do you need during the singularity?” by Buck

5 juli | 10 min

“‘AI for societal uplift’ as a path to victory” by Raymond Douglas

4 juli | 4 min

“Two proposed projects on abstract analogies for scheming” by Julian Stastny

4 juli | 7 min

“Outlive: A Critical Review” by MichaelDickens

4 juli | 60 min

“Authors Have a Responsibility to Communicate Clearly” by TurnTrout

4 juli | 11 min

[Linkpost] “MIRI Newsletter #123” by Harlan, Rob Bensinger

4 juli | 4 min

[Linkpost] “Research Note: Our scheming precursor evals had limited predictive power for our in-context scheming evals” by Marius Hobbhahn

3 juli | 3 min

“Call for suggestions - AI safety course” by boazbarak

3 juli | 3 min

[Linkpost] “IABIED: Advertisement design competition” by yams

3 juli | 2 min

“Congress Asks Better Questions” by Zvi

3 juli | 30 min

“Curing PMS with Hair Loss Pills” by David Lorell

2 juli | 16 min

“AI Task Length Horizons in Offensive Cybersecurity” by Sean Peters

2 juli | 24 min

“Race and Gender Bias As An Example of Unfaithful Chain of Thought in the Wild” by Adam Karvonen, Sam Marks

2 juli | 8 min

“There are two fundamentally different constraints on schemers” by Buck

2 juli | 7 min

“‘What’s my goal?’” by Raemon

2 juli | 4 min

“A Simple Explanation of AGI Risk” by TurnTrout

2 juli | 10 min

“AI Moratorium Stripped From BBB” by Zvi

1 juli | 10 min

“Scientific Discovery in the Age of Artificial Intelligence” by Jessica Rumbelow

1 juli | 20 min

“SLT for AI Safety” by Jesse Hoogland

1 juli | 8 min

“The best simple argument for Pausing AI?” by Gary Marcus

1 juli | 2 min

“SAE on activation differences” by Santiago Aranguri, jacob_drori, Neel Nanda

1 juli | 11 min

“What We Learned Trying to Diff Base and Chat Models (And Why It Matters)” by Clément Dumas, Julian Minder, Neel Nanda

30 juni | 20 min

“If you want to be vegan but you worry about health effects of no meat, consider being vegan except for mussels/oysters” by KatWoods

30 juni | 1 min

[Linkpost] “Project Vend: Can Claude run a small shop?” by Gunnar_Zarncke

30 juni | 1 min

“Paradigms for computation” by Cole Wyeth

30 juni | 20 min

“life lessons from poker” by thiccythot

30 juni | 9 min

“Circuits in Superposition 2: Now with Less Wrong Math” by Linda Linsefors, Lucius Bushnaq

30 juni | 38 min

“I underestimated safety research speedups from safe AI” by Dan Braun

30 juni | 6 min

“Conciseness Manifesto” by Vasyl Dotsenko

29 juni | 0 min

“Support for bedrock liberal principles seems to be in pretty bad shape these days” by Max H

29 juni | 7 min

[Linkpost] “A Depressed Shrink Tries Shrooms” by AlphaAndOmega

29 juni | 1 min

[Linkpost] “[Paper] Stochastic Parameter Decomposition” by Lee Sharkey, Lucius Bushnaq, Dan Braun

28 juni | 2 min

“Childhood and Education #11: The Art of Learning” by Zvi

28 juni | 24 min

“Proposal for making credible commitments to AIs.” by Cleo Nardo

28 juni | 5 min

“Epoch: What is Epoch?” by Zach Stein-Perlman

27 juni | 15 min

“Recent and forecasted rates of software and hardware progress” by elifland

27 juni | 2 min

“Jankily controlling superintelligence” by ryan_greenblatt

27 juni | 14 min

“Help the AI 2027 team make an online AGI wargame” by Jonas V

27 juni | 2 min

“A Guide For LLM-Assisted Web Research” by nikos, dschwarz, Lawrence Phillips, FutureSearch

27 juni | 15 min

“If Not Now, When?” by Yair Halberstadt

27 juni | 2 min

“A case for courage, when speaking of AI danger” by So8res

27 juni | 10 min

“The Industrial Explosion” by rosehadshar, Tom Davidson

26 juni | 32 min

“Summary of John Halstead’s Book-Length Report on Existential Risks From Climate Change” by Bentham’s Bulldog

26 juni | 45 min

“Tech for Thinking” by sarahconstantin

26 juni | 4 min

“Lurking in the Noise” by J Bostock

25 juni | 8 min

[Linkpost] “New Paper: Ambiguous Online Learning” by Vanessa Kosoy

25 juni | 3 min

“Melatonin Self-Experiment Results” by silentbob

25 juni | 20 min

“What does 10x-ing effective compute get you?” by ryan_greenblatt

25 juni | 21 min

“A regime-change power-vacuum conjecture about group belief” by TsviBT

25 juni | 6 min

“Analyzing A Critique Of The AI 2027 Timeline Forecasts” by Zvi

24 juni | 55 min

“Why ‘training against scheming’ is hard” by Marius Hobbhahn

24 juni | 24 min

“My pitch for the AI Village” by Daniel Kokotajlo

24 juni | 13 min

“Situational Awareness: A One-Year Retrospective” by Nathan Delisle

24 juni | 33 min

“Compressed Computation is (probably) not Computation in Superposition” by Jai Bhagat, Sara Molas Medina, Giorgi Giglemiani, StefanHex

23 juni | 22 min

“‘It isn’t magic’” by Ben (Berlin)

23 juni | 4 min

“Foom & Doom 1: ‘Brain in a box in a basement’” by Steven Byrnes

23 juni | 59 min

“Foom & Doom 2: Technical alignment is hard” by Steven Byrnes

23 juni | 57 min

“Comparing risk from internally-deployed AI to insider and outsider threats from humans” by Buck

23 juni | 5 min

“Clarifying ‘wisdom’: Foundational topics for aligned AIs to prioritize before irreversible decisions” by Anthony DiGiovanni

23 juni | 27 min

“Racial Dating Preferences and Sexual Racism” by koreindian

23 juni | 64 min

“The Sixteen Kinds of Intimacy” by Ruby

22 juni | 10 min

“Consider chilling out in 2028” by Valentine

21 juni | 24 min

“the sillk pajamas effect” by thiccythot

21 juni | 10 min

“Genomic emancipation” by TsviBT

21 juni | 113 min

“Making deals with early schemers” by Julian Stastny, Olli Järviniemi, Buck

21 juni | 28 min

“AI #121 Part 2: The OpenAI Files” by Zvi

21 juni | 78 min

“Musings on AI Companies of 2025-2026 (Jun 2025)” by Vladimir_Nesov

21 juni | 7 min

“Agentic Misalignment: How LLMs Could be Insider Threats” by Aengus Lynch, Benjamin Wright, Ethan Perez, evhub

20 juni | 12 min

“Did the Army Poison a Bunch of Women in Minnesota?” by rba

20 juni | 10 min

“X explains Z% of the variance in Y” by Leon Lang

20 juni | 19 min

“AI safety techniques leveraging distillation” by ryan_greenblatt

19 juni | 21 min

“Sparsely-connected cross-layer transcoders: preliminary findings” by jacob_drori

19 juni | 33 min

“New Endorsements for ‘If Anyone Builds It, Everyone Dies’” by Malo

18 juni | 9 min

“Fictional Thinking vs Real Thinking” by johnswentworth

18 juni | 8 min

“I made a card game to reduce cognitive biases and logical fallacies but I’m not sure what DV to test in a study on its effectiveness.” by Brad Dunn

18 juni | 9 min

“Prover-Estimator Debate: A New Scalable Oversight Protocol” by Jonah Brown-Cohen, Geoffrey Irving

17 juni | 11 min

“Ok, AI Can Write Pretty Good Fiction Now” by JustisMills

17 juni | 13 min

“Why we’re still doing normal school” by juliawise

17 juni | 5 min

“Endometriosis is an incredibly interesting disease” by Abhishaike Mahajan

17 juni | 35 min

“Estrogen: A trip report” by cube_flipper

17 juni | 51 min

“Intelligence Is Not Magic, But Your Threshold For ‘Magic’ Is Pretty Low” by Expertium

17 juni | 3 min

“Some reprogenetics-related projects you could help with” by TsviBT

17 juni | 8 min

“RTFB: The RAISE Act” by Zvi

17 juni | 16 min

“Debate experiments at The Curve, LessOnline and Manifest” by Nathan Young

17 juni | 10 min

“Convergent Linear Representations of Emergent Misalignment” by Anna Soligo, Edward Turner, Senthooran Rajamanoharan, Neel Nanda

17 juni | 19 min

“Thought Crime: Backdoors and Emergent Misalignment in Reasoning Models” by James Chua, Owain_Evans

17 juni | 19 min

“Model Organisms for Emergent Misalignment” by Anna Soligo, Edward Turner, Mia Taylor, Senthooran Rajamanoharan, Neel Nanda

17 juni | 12 min

[Linkpost] “the void” by nostalgebraist

11 juni | 1 min

“Expectation = intention = setpoint” by jimmy

11 juni | 22 min

“Give Me a Reason(ing Model)” by Zvi

10 juni | 12 min

“Mech interp is not pre-paradigmatic” by Lee Sharkey

10 juni | 30 min

“The True Goal Fallacy” by adamShimi

10 juni | 13 min

“Ghiblification for Privacy” by jefftk

10 juni | 2 min

10 juni | 0 min

“When is it important that open-weight models aren’t released? My thoughts on the benefits and dangers of open-weight models in response to developments in CBRN capabilities.” by ryan_greenblatt

9 juni | 17 min

“Administering immunotherapy in the morning seems to really, really matter. Why?” by Abhishaike Mahajan

9 juni | 23 min

[Linkpost] “METR: Recent frontier models are reward hacking” by Daniel Kokotajlo

9 juni | 1 min

“Levels of Doom: Eutopia, Disempowerment, Extinction” by Vladimir_Nesov

9 juni | 4 min

“AI companies’ eval reports mostly don’t support their claims” by Zach Stein-Perlman

9 juni | 8 min

“Busking with Kids” by jefftk

9 juni | 3 min

“Emergent Misalignment on a Budget” by Valerio Pepe

9 juni | 17 min

“Letting Kids Be Outside” by jefftk

8 juni | 8 min

“On working 80%” by adrische

8 juni | 5 min

“Solo Park Play at Three” by jefftk

7 juni | 2 min

“The Mirror Trap” by Cameron Berg

7 juni | 9 min

“LLM in-context learning as (approximating) Solomonoff induction” by Cole Wyeth

6 juni | 9 min

[Linkpost] “Histograms are to CDFs as calibration plots are to...” by Optimization Process

6 juni | 3 min

[Linkpost] “Discontinuous Linear Functions?!” by Zack_M_Davis

6 juni | 5 min

“AI #119: Goodbye AISI?” by Zvi

6 juni | 117 min

“The Stereotype of the Stereotype” by Ike

6 juni | 16 min

“Dating Roundup #6” by Zvi

5 juni | 103 min

“‘Flaky breakthroughs’ pervade coaching — and no one tracks them” by Chipmonk

4 juni | 10 min

“Individual AI representatives don’t solve Gradual Disempowerement” by Jan_Kulveit

4 juni | 7 min

“Broad-Spectrum Cancer Treatments” by sarahconstantin

4 juni | 19 min

“In Which I Make the Mistake of Fully Covering an Episode of the All-In Podcast” by Zvi

3 juni | 48 min

[Linkpost] “Seeing how well an agentic AI coding tool can do compared to me using an actual real-world example” by Massimog

3 juni | 1 min

“1. The challenge of unawareness for impartial altruist action guidance: Introduction” by Anthony DiGiovanni

2 juni | 26 min

“Unfaithful Reasoning Can Fool Chain-of-Thought Monitoring” by Benjamin Arnav

2 juni | 8 min

“The Value Proposition of Romantic Relationships” by johnswentworth

2 juni | 23 min

“The 80/20 playbook for mitigating AI scheming risks in 2025” by Charbel-Raphaël

1 juni | 11 min

“The best approaches for mitigating ‘the intelligence curse’ (or gradual disempowerment); my quick guesses at the best object-level interventions” by ryan_greenblatt

31 maj | 10 min

“‘GiveWell for AI Safety’: Lessons learned in a week” by Lydia Nottingham

31 maj | 12 min

“Letting Kids Be Kids” by Zvi

30 maj | 38 min

“Orphaned Policies (Post 5 of 6 on AI Governance)” by Mass_Driver

30 maj | 27 min

“Do you even have a system prompt? (PSA)” by Croissanthology

30 maj | 4 min

[Linkpost] “Incorrect Baseline Evaluations Call into Question Recent LLM-RL Claims” by shash42

30 maj | 3 min

“CFAR is running an experimental mini-workshop (June 2-6, Berkeley CA)!” by Davis_Kingsley

30 maj | 5 min

“AI #118: Claude Ascendant” by Zvi

29 maj | 112 min

“Gradual Disempowerment: Concrete Research Projects” by Raymond Douglas

29 maj | 20 min

“Truth or Dare” by Duncan Sabien (Inactive)

29 maj | 123 min

“The Best Way to Align an LLM: Inner Alignment is Now a Solved Problem?” by RogerDearnaley

29 maj | 14 min

“LessWrong Feed [new, now in beta]” by Ruby

28 maj | 15 min

“Shift Resources to Advocacy Now (Post 4 of 6 on AI Governance)” by Mass_Driver

28 maj | 55 min

[Linkpost] “If you’re not sure how to sort a list or grid—seriate it!” by gwern

28 maj | 5 min

“Beware the Moral Homophone” by ymeskhout

28 maj | 19 min

“Briefly analyzing the 10-year moratorium amendment” by RobertM

28 maj | 5 min

“What We Learned from Briefing 70+ Lawmakers on the Threat from AI” by leticiagarcia

28 maj | 32 min

“Requiem for the hopes of a pre-AI world” by Mitchell_Porter

27 maj | 6 min

“Season Recap of the Village: Agents raise $2,000” by Shoshannah Tekofsky

27 maj | 13 min

“Association taxes are collusion subsidies” by KatjaGrace

27 maj | 2 min

“Socratic Persuasion: Giving Opinionated Yet Truth-Seeking Advice” by Neel Nanda

27 maj | 40 min

[Linkpost] “Formalizing Embeddedness Failures in Universal Artificial Intelligence” by Cole Wyeth

27 maj | 1 min

“Claude 4 You: The Quest for Mundane Utility” by Zvi

27 maj | 38 min

“New website analyzing AI companies’ model evals” by Zach Stein-Perlman

26 maj | 8 min

“New scorecard evaluating AI companies on safety” by Zach Stein-Perlman

26 maj | 1 min

“Alignment Proposal: Adversarially Robust Augmentation and Distillation” by Cole Wyeth, abramdemski

26 maj | 21 min

[Linkpost] “Priming effects are fake, but framing effects are real” by Matrice Jacobine

25 maj | 2 min

“Meditations on Doge” by Martin Sustrik

25 maj | 18 min

“Claude 4 You: Safety and Alignment” by Zvi

25 maj | 119 min

“It’s hard to make scheming evals look realistic” by Igor Ivanov, dan_moken

25 maj | 8 min

“That’s Not How Epigenetic Modifications Work” by johnswentworth

24 maj | 3 min

“AI #117: OpenAI Buys Device Maker IO” by Zvi

24 maj | 119 min

“Learning (more) from horse employment history” by Tim H

23 maj | 10 min

“Reward button alignment” by Steven Byrnes

23 maj | 22 min

“Mirror Organisms Are Not Immune to Predation” by Matthias Dellago

23 maj | 3 min

“Anthropic is Quietly Backpedalling on its Safety Commitments” by garrison

23 maj | 12 min

“We’re Not Advertising Enough (Post 3 of 6 on AI Governance)” by Mass_Driver

22 maj | 48 min

“Units have more depth than I thought” by Morpheus

22 maj | 2 min

“Policy recommendations regarding reproductive technology” by TsviBT

22 maj | 6 min

[Linkpost] “Claude 4” by Zach Stein-Perlman

22 maj | 1 min

“Can We Naturalize Moral Epistemology?” by tylermjohn

22 maj | 9 min

[Linkpost] “President of European Commission expects human-level AI by 2026” by sanyer

22 maj | 1 min

“Google I/O Day” by Zvi

22 maj | 39 min

“The Need for Political Advertising (Post 2 of 6 on AI Governance)” by Mass_Driver

22 maj | 23 min

“Unexploitable search: blocking malicious use of free parameters” by Benjamin Hilton, Jacob Pfau, Geoffrey Irving

22 maj | 13 min

“Sleep need reduction therapies” by harsimony

21 maj | 33 min

“The stakes of AI moral status” by Joe Carlsmith

21 maj | 34 min

“The Codex of Ultimate Vibing” by Zvi

21 maj | 22 min

“Off-ramps of the Geopolitical Singularity” by Nikola Jurkovic

20 maj | 10 min

[Linkpost] “Gemini Diffusion: watch this space” by Yair Halberstadt

20 maj | 2 min

“Winning the power to lose” by KatjaGrace

20 maj | 4 min

“Semen and Semantics: Understanding Porn with Language Embeddings” by future_detective

20 maj | 18 min

“America Makes AI Chip Diffusion Deal with UAE and KSA” by Zvi

20 maj | 49 min

“Thoughts on ‘Antiqua et nova’ (Catholic Church’s AI statement)” by jchan

20 maj | 21 min

[Linkpost] “One Year in DC” by tlevin

19 maj | 3 min

[Linkpost] “[Funded Fellowship] AI for Human Reasoning Fellowship, with the Future of Life Foundation” by Oliver Sourbut, Ben Goldhaber

19 maj | 4 min

“A widely shared AI productivity paper was retracted, is possibly fraudulent” by titotal

19 maj | 7 min

“Dreams of Ideas” by Joseph Miller

19 maj | 6 min

“Google Logo Ligature Bug” by jefftk

18 maj | 2 min

“D&D.Sci: The Choosing Ones” by abstractapplic

18 maj | 3 min

“Book Review: The Art of Happiness” by Screwtape

18 maj | 19 min

“What OpenAI Told California’s Attorney General” by garrison

18 maj | 15 min

“time is event based” by thiccythot

17 maj | 11 min

“Events: Debate & Fiction Project” by abramdemski

17 maj | 2 min

“How Fast Can Algorithms Advance Capabilities? | Epoch Gradient Update” by henryj

17 maj | 15 min

“Management is the Near Future” by jefftk

17 maj | 4 min

[Linkpost] “Social Anxiety Isn’t About Being Liked” by Chipmonk

17 maj | 5 min

“Problems with Instruction Following as an Alignment Target” by Seth Herd

16 maj | 19 min

“Regarding South Africa” by Zvi

16 maj | 28 min

“Generating the Funniest Joke with RL (according to GPT-4.1)” by agg

16 maj | 9 min

“AI #116: If Anyone Builds It, Everyone Dies” by Zvi

16 maj | 86 min

“What does it mean to ‘write like you talk’?” by Arjun Panickssery

15 maj | 10 min

“Re SMTM: negative feedback on negative feedback” by Steven Byrnes

15 maj | 42 min

“Moral Obligation and Moral Opportunity” by Alice Blair

15 maj | 6 min

“Fighting Obvious Nonsense About AI Diffusion” by Zvi

14 maj | 38 min

“Dodging systematic human errors in scalable oversight” by Benjamin Hilton, Geoffrey Irving

14 maj | 9 min

“Eliezer and I wrote a book: If Anyone Builds It, Everyone Dies” by So8res

14 maj | 7 min

“The Best Reference Works for Every Subject” by Parker Conley

14 maj | 13 min

“LessWrong Community Weekend - Applications are open” by jt

14 maj | 4 min

“Working through a small tiling result” by James Payor

14 maj | 8 min

[Linkpost] “October The First Is Too Late” by gwern

14 maj | 1 min

“Too Soon” by Gordon Seidoh Worley

13 maj | 8 min

“No-self as an alignment target” by Milan W

13 maj | 2 min

“AI Doomerism in 1879” by David Gross

13 maj | 13 min

[Linkpost] “No, We’re Not Getting Meaningful Oversight of AI” by Davidmanheim

Senaste avsnitt

[Linkpost] “Bernie Sanders (I-VT) mentions AI loss of control risk in Gizmodo interview” by Matrice Jacobine

“Recent Redwood Research project proposals” by ryan_greenblatt, Buck, Julian Stastny, joshc, Alex Mallen, Adam Kaufman, Tyler Tracy, Aryan Bhatt, Joey Yudelson

“Narrow Misalignment is Hard, Emergent Misalignment is Easy” by Edward Turner, Anna Soligo, Senthooran Rajamanoharan, Neel Nanda

“Do LLMs know what they’re capable of? Why this matters for AI safety, and initial findings” by Casey Barkan, Sid Black, Oliver Sourbut

“Self-preservation or Instruction Ambiguity? Examining the Causes of Shutdown Resistance” by Senthooran Rajamanoharan, Neel Nanda

“Worse Than MechaHitler” by Zvi

“How Does Time Horizon Vary Across Domains?” by Thomas Kwa

“xAI’s Grok 4 has no meaningful safety guardrails” by eleventhsavi0r

“Stop and check! The parable of the prince and the dog” by Dumbledore’s Army

“OpenAI Model Differentiation 101” by Zvi

“10x more training compute = 5x greater task length (kind of)” by Expertium

“Three Missing Cakes, or One Turbulent Critic?” by Benquo

“You can get LLMs to say almost anything you want” by Kaj_Sotala

“against that one rationalist mashal about japanese fifth-columnists” by Fraser

“Surprises and learnings from almost two months of Leo Panickssery” by Nina Panickssery

“Vitalik’s Response to AI 2027” by Daniel Kokotajlo

“the jackpot age” by thiccythot

“Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity” by habryka

[Linkpost] “Guide to Redwood’s writing” by Julian Stastny

“So You Think You’ve Awoken ChatGPT” by JustisMills

[Linkpost] “Open Global Investment as a Governance Model for AGI” by Nick Bostrom

“what makes Claude 3 Opus misaligned” by janus

“Lessons from the Iraq War about AI policy” by Buck

“Generalized Hangriness: A Standard Rationalist Stance Toward Emotions” by johnswentworth

“Evaluating and monitoring for AI scheming” by Vika, Scott Emmons, Erik Jenner, Mary Phuong, Lewis Ho, Rohin Shah

“White Box Control at UK AISI - Update on Sandbagging Investigations” by Joseph Bloom, Jordan Taylor, Connor Kissane, Sid Black, merizian, alexdzm, jacoba, Ben Millwood, Alan Cooney

“80,000 Hours is producing AI in Context — a new YouTube channel. Our first video, about the AI 2027 scenario, is up!” by chanamessinger

[Linkpost] “No, We’re Not Getting Meaningful Oversight of AI” by Davidmanheim

“What’s worse, spies or schemers?” by Buck, Julian Stastny

“Applying right-wing frames to AGI (geo)politics” by Richard_Ngo

“No, Grok, No” by Zvi

“A deep critique of AI 2027’s bad timeline models” by titotal

“Subway Particle Levels Aren’t That High” by jefftk

“An Opinionated Guide to Using Anki Correctly” by Luise

“Why Do Some Language Models Fake Alignment While Others Don’t?” by abhayesian, John Hughes, Alex Mallen, Jozdien, janus, Fabien Roger

“Balsa Update: Springtime in DC” by Zvi

[Linkpost] “A Theory of Structural Independence” by Matthias G. Mayer

“On Alpha School” by Zvi

“You Can’t Objectively Compare Seven Bees to One Human” by J Bostock

“Literature Review: Risks of MDMA” by Elizabeth

“45 - Samuel Albanie on DeepMind’s AGI Safety Approach” by DanielFilan

“On the functional self of LLMs” by eggsyntax

“Shutdown Resistance in Reasoning Models” by benwr, JeremySchlatter, Jeffrey Ladish

“The Cult of Pain” by Martin Sustrik

[Linkpost] “Claude is a Ravenclaw” by Adam Newgas

“‘Buckle up bucko, this ain’t over till it’s over.’” by Raemon

“How much novel security-critical infrastructure do you need during the singularity?” by Buck

“‘AI for societal uplift’ as a path to victory” by Raymond Douglas

“Two proposed projects on abstract analogies for scheming” by Julian Stastny

“Outlive: A Critical Review” by MichaelDickens

“Authors Have a Responsibility to Communicate Clearly” by TurnTrout

[Linkpost] “MIRI Newsletter #123” by Harlan, Rob Bensinger

[Linkpost] “Research Note: Our scheming precursor evals had limited predictive power for our in-context scheming evals” by Marius Hobbhahn

“Call for suggestions - AI safety course” by boazbarak

[Linkpost] “IABIED: Advertisement design competition” by yams

“Congress Asks Better Questions” by Zvi

“Curing PMS with Hair Loss Pills” by David Lorell

“AI Task Length Horizons in Offensive Cybersecurity” by Sean Peters

“Race and Gender Bias As An Example of Unfaithful Chain of Thought in the Wild” by Adam Karvonen, Sam Marks

“There are two fundamentally different constraints on schemers” by Buck

“‘What’s my goal?’” by Raemon

“A Simple Explanation of AGI Risk” by TurnTrout

“AI Moratorium Stripped From BBB” by Zvi

“Scientific Discovery in the Age of Artificial Intelligence” by Jessica Rumbelow

“SLT for AI Safety” by Jesse Hoogland

“The best simple argument for Pausing AI?” by Gary Marcus

“SAE on activation differences” by Santiago Aranguri, jacob_drori, Neel Nanda

“What We Learned Trying to Diff Base and Chat Models (And Why It Matters)” by Clément Dumas, Julian Minder, Neel Nanda

“If you want to be vegan but you worry about health effects of no meat, consider being vegan except for mussels/oysters” by KatWoods

[Linkpost] “Project Vend: Can Claude run a small shop?” by Gunnar_Zarncke

“Paradigms for computation” by Cole Wyeth

“life lessons from poker” by thiccythot

“Circuits in Superposition 2: Now with Less Wrong Math” by Linda Linsefors, Lucius Bushnaq

“I underestimated safety research speedups from safe AI” by Dan Braun

“Conciseness Manifesto” by Vasyl Dotsenko

“Support for bedrock liberal principles seems to be in pretty bad shape these days” by Max H

[Linkpost] “A Depressed Shrink Tries Shrooms” by AlphaAndOmega

[Linkpost] “[Paper] Stochastic Parameter Decomposition” by Lee Sharkey, Lucius Bushnaq, Dan Braun