Bra podd

[Linkpost] “In defense of the amyloid hypothesis” by dsj

15 augusti | 1 min

“Training a Reward Hacker Despite Perfect Labels” by ariana_azarbal, vgillioz, TurnTrout

15 augusti | 13 min

“Somebody invented a better bookmark” by Alex_Altair

14 augusti | 4 min

[Linkpost] “METR Research Update: Algorithmic vs. Holistic Evaluation” by David Rein

14 augusti | 1 min

“Launching new AIXI research community website + reading group(s)” by Cole Wyeth

13 augusti | 1 min

[Linkpost] “Why Are There So Many Rationalist Cults?” by omark

13 augusti | 1 min

“Enlightenment AMA” by lsusr

13 augusti | 2 min

“Mech Interp Wiki Page and Why You Should Edit Wikipedia” by Noah Birnbaum, JoNeedsSleep

13 augusti | 3 min

“Generalized Coming Out Of The Closet” by johnswentworth

12 augusti | 7 min

“The Bone-Chilling Evil of Factory Farming” by Bentham’s Bulldog

12 augusti | 10 min

“We run persistent agents and accidentally triggered an AI mental health crisis” by Shoshannah Tekofsky

12 augusti | 4 min

“CoT May Be Highly Informative Despite ‘Unfaithfulness’ [METR]” by GradientDissenter

12 augusti | 67 min

“Measuring intelligence and reverse-engineering goals” by jessicata

12 augusti | 18 min

“The trajectory of the future could soon get set in stone” by wdmacaskill

12 augusti | 6 min

[Linkpost] “Thoughts on extrapolating time horizons” by Nikola Jurkovic

12 augusti | 4 min

“How Does A Blind Model See The Earth?” by henry

11 augusti | 21 min

“If worker coops are so productive, why aren’t they everywhere?” by B Jacobs

11 augusti | 8 min

“GPT-5s Are Alive: Basic Facts, Benchmarks and the Model Card” by Zvi

11 augusti | 65 min

“Breaking the Cycle of Trauma and Tyranny: How Psychological Wounds Shape History” by Dawn Drescher

11 augusti | 25 min

“My Least Libertarian Opinion: Ban Exclusivity Deals*” by Brendan Long

11 augusti | 4 min

“Having children is a deeply personal choice. Do not use ethical arguments to try to shame people into having them or not having them.” by KatWoods

11 augusti | 4 min

“A Self-Dialogue on The Value Proposition of Romantic Relationships” by johnswentworth

10 augusti | 14 min

“4 places where you can put LLM monitoring” by Fabien Roger, Buck

10 augusti | 15 min

“OpenAI’s GPT-OSS Is Already Old News” by Zvi

9 augusti | 41 min

“The Tortoise and the Language Model (A Fable After Hofstadter)” by mwatkins

9 augusti | 8 min

“Extract-and-Evaluate Monitoring Can Significantly Enhance CoT Monitoring Performance (Research Note)” by Rauno Arike, RohanS, Shubhorup Biswas

9 augusti | 20 min

“What would a human pretending to be an AI say?” by Brendan Long

9 augusti | 2 min

“How anticipatory cover-ups go wrong” by Kaj_Sotala

8 augusti | 11 min

“METR’s Evaluation of GPT-5” by GradientDissenter

7 augusti | 48 min

“Civil Service: a Victim or a Villain?” by Martin Sustrik

7 augusti | 8 min

“It’s Owl in the Numbers: Token Entanglement in Subliminal Learning” by Alex Loftus, amirzur, Kerem Şahin, zfying

7 augusti | 11 min

“No, Rationalism Is Not a Cult” by Liam Robins

7 augusti | 20 min

“Interview with Kelsey Piper on Self-Censorship and the Vibe Shift” by Zack_M_Davis

7 augusti | 26 min

“Claude, GPT, and Gemini All Struggle to Evade Monitors” by Vincent Cheng, Thomas Kwa

7 augusti | 12 min

“Opus 4.1 Is An Incremental Improvement” by Zvi

7 augusti | 14 min

“Re: Recent Anthropic Safety Research” by Eliezer Yudkowsky

6 augusti | 9 min

“Inscrutability was always inevitable, right?” by Steven Byrnes

6 augusti | 4 min

“Statistical takes for mech interp research and beyond” by Paul Bogdan

6 augusti | 31 min

[Linkpost] “OpenAI Releases gpt-oss” by anaguma

6 augusti | 4 min

“Childhood and Education #13: College” by Zvi

6 augusti | 45 min

“The perils of under- vs over-sculpting AGI desires” by Steven Byrnes

6 augusti | 45 min

“The Problem” by Rob Bensinger, tanagrabeast, yams, So8res, Eliezer Yudkowsky, Gretta Duleba

5 augusti | 50 min

“Concept Poisoning: Probing LLMs without probes” by Jan Betley, jorio, dylan_f, Owain_Evans

5 augusti | 33 min

“Narrow finetuning is different” by cloud, Stewy Slocum

5 augusti | 7 min

“On Altman’s Interview With Theo Von” by Zvi

5 augusti | 17 min

“Interview with Steven Byrnes on Brain-like AGI, Foom & Doom, and Solving Technical Alignment” by Liron, Steven Byrnes

5 augusti | 154 min

“Towards Alignment Auditing as a Numbers-Go-Up Science” by Sam Marks

4 augusti | 18 min

“Alcohol is so bad for society that you should probably stop drinking” by KatWoods

4 augusti | 16 min

“Permanent Disempowerment is the Baseline” by Vladimir_Nesov

4 augusti | 11 min

“Should we aim for flourishing over mere survival? The Better Futures series.” by wdmacaskill

4 augusti | 9 min

“Saying Goodbye” by sapphire

4 augusti | 9 min

“Emotions Make Sense” by DaystarEld

3 augusti | 36 min

“Whence the Inkhaven Residency?” by Ben Pace

2 augusti | 5 min

“Many prediction markets would be better off as batched auctions” by William Howard

2 augusti | 9 min

“How many species has humanity driven extinct?” by Raemon

2 augusti | 1 min

“SB-1047 Documentary: The Post-Mortem” by Michaël Trazzi

2 augusti | 10 min

“Podcast: Lincoln Quirk from Wave” by Elizabeth

2 augusti | 2 min

“The Dark Arts As A Scaffolding Skill For Rationality” by Screwtape

1 augusti | 11 min

“Steve Petersen funding” by abramdemski

1 augusti | 1 min

“Two Kinds of Do Overs” by jefftk

1 augusti | 4 min

“Red-Thing-Ism” by J Bostock

1 augusti | 6 min

“Do Not Render Your Counterfactuals” by AlphaAndOmega

1 augusti | 10 min

“Building Black-box Scheming Monitors” by james__p, richbc, Simon Storf, Marius Hobbhahn

31 juli | 24 min

“Follow-up to ‘My Empathy Is Rarely Kind’” by johnswentworth

31 juli | 4 min

“I am worried about near-term non-LLM AI developments” by testingthewaters

31 juli | 11 min

“Childhood and Education: College Admissions” by Zvi

31 juli | 34 min

“Optimizing The Final Output Can Obfuscate CoT (Research Note)” by lukemarks, jacob_drori, cloud, TurnTrout

30 juli | 12 min

“China proposes new global AI cooperation organisation” by Matrice Jacobine

30 juli | 2 min

“My Empathy Is Rarely Kind” by johnswentworth

30 juli | 6 min

“The many paths to permanent disempowerment even with shutdownable AIs (MATS project summary for feedback)” by GideonF

30 juli | 18 min

“Spilling the Tea” by Zvi

29 juli | 23 min

“I wrote a song parody” by CronoDAS

29 juli | 3 min

“Low P(x-risk) as the Bailey for Low P(doom)” by Vladimir_Nesov

29 juli | 5 min

“About 30% of Humanity’s Last Exam chemistry/biology answers are likely wrong” by bohaska

29 juli | 7 min

“Procrastination Drill” by silentbob

29 juli | 5 min

“Teaching kids to swim” by Steven Byrnes

29 juli | 5 min

“Recursions on LessOnline 2025” by Error

29 juli | 33 min

“Simplex Progress Report - July 2025” by Adam Shai, Paul Riechers, hrbigelow, Eric Alt, mntss

29 juli | 33 min

“Optimally Combining Probe Monitors and Black Box Monitors” by Tim Hua, jamesbaskerville, BionicD0LPH1N, Mia Hopman, Aryan Bhatt, Tyler Tracy

28 juli | 13 min

“AI Companion Piece” by Zvi

28 juli | 29 min

“This Is Not Life” by samhealy

28 juli | 44 min

“Sydney Bing Wikipedia Article: Sydney (Microsoft Prometheus)” by jdp

28 juli | 14 min

“Maya’s Escape” by Bridgett Kay

27 juli | 20 min

[Linkpost] “The Purpose of a System is what it Rewards” by robotelvis

27 juli | 3 min

“my experience on glp-1s as a thin person” by AnnaJo

26 juli | 17 min

“Anthropic Faces Potentially ‘Business-Ending’ Copyright Lawsuit” by garrison

26 juli | 14 min

“HPMOR: The (Probably) Untold Lore” by Gretta Duleba, Eliezer Yudkowsky

25 juli | 68 min

“We Built a Tool to Protect Your Dataset From Simple Scrapers” by TurnTrout, Edward Turner, Dipika Khullar

25 juli | 6 min

[Linkpost] “Reasoning-Finetuning Repurposes Latent Representations in Base Models” by Jake Ward, lccqqqqq, Neel Nanda

25 juli | 6 min

“Building and evaluating alignment auditing agents” by Sam Marks, Sam Bowman, Euan Ong, Johannes Treutlein, evhub

24 juli | 11 min

“The Whole Check” by JustisMills

24 juli | 7 min

“‘Behaviorist’ RL reward functions lead to scheming” by Steven Byrnes

24 juli | 21 min

[Linkpost] “A brief perspective from an IMO coordinator” by DirectedEvolution

23 juli | 2 min

“Steering Out-of-Distribution Generalization with Concept Ablation Fine-Tuning” by kh4dien, Helena Casademunt, Adam Karvonen, Sam Marks, Senthooran Rajamanoharan, Neel Nanda

23 juli | 12 min

“On ‘ChatGPT Psychosis’ and LLM Sycophancy” by jdp

23 juli | 30 min

“Google and OpenAI Get 2025 IMO Gold” by Zvi

23 juli | 58 min

“Unfaithful chain-of-thought as nudged reasoning” by Paul Bogdan, Uzay Macar, Arthur Conmy, Neel Nanda

23 juli | 20 min

“Subliminal Learning: LLMs Transmit Behavioral Traits via Hidden Signals in Data” by cloud, mle, Owain_Evans

22 juli | 10 min

“Directly Try Solving Alignment for 5 weeks” by Kabir Kumar

22 juli | 14 min

[Linkpost] “Why Reality Has A Well-Known Math Bias” by Linch

22 juli | 3 min

“Do ‘adult developmental stages’ theories have any pre-theoretical motivation?” by Said Achmiz

22 juli | 6 min

“Monthly Roundup #32: July 2025” by Zvi

21 juli | 75 min

“If Anyone Builds It, Everyone Dies: Call for Translators (for Supplementary Materials)” by yams

21 juli | 3 min

“Detecting High-Stakes Interactions with Activation Probes” by Arrrlex, williambankes, Urja Pawar, Phil Bland, David Scott Krueger (formerly: capybaralet), Dmitrii Krasheninnikov

21 juli | 11 min

“HRT in Menopause: A candidate for a case study of epistemology in epidemiology, statistics & medicine” by foodforthought

21 juli | 7 min

[Linkpost] “GDM also claims IMO gold medal” by Yair Halberstadt

21 juli | 1 min

“[Fiction] Our Trial” by Nina Panickssery

21 juli | 7 min

“LLMs Can’t See Pixels or Characters” by Brendan Long

21 juli | 9 min

“Plato’s Trolley” by dr_s

21 juli | 12 min

“Your AI Safety org could get EU funding up to €9.08M. Here’s how (+ free personalized support)” by SamuelK

20 juli | 6 min

“Shallow Water is Dangerous Too” by jefftk

20 juli | 3 min

“Make More Grayspaces” by Duncan Sabien (Inactive)

20 juli | 23 min

[Linkpost] “AI Gets IMO Gold Medal: via general-purpose RL, not via narrow, task specific methodology” by Mikhail Samin

19 juli | 4 min

“A night-watchman ASI as a first step toward a great future” by Eric Neyman

19 juli | 21 min

“Love stays loved (formerly ‘Skin’)” by Swimmer963 (Miranda Dixon-Luinenburg)

18 juli | 51 min

“Why it’s hard to make settings for high-stakes control research” by Buck

18 juli | 7 min

“On METR’s AI Coding RCT” by Zvi

18 juli | 21 min

“Trying the Obvious Thing” by PranavG, Gabriel Alfour

18 juli | 7 min

“Video and transcript of talk on ‘Can goodness compete?’” by Joe Carlsmith

17 juli | 67 min

“On being sort of back and sort of new here” by Loki zen

17 juli | 5 min

“Comment on ‘Four Layers of Intellectual Conversation’” by Zack_M_Davis

17 juli | 10 min

“Selective Generalization: Improving Capabilities While Maintaining Alignment” by ariana_azarbal, Matthew A. Clarke, jorio, Cailley Factor, cloud

17 juli | 18 min

“Bodydouble / Thinking Assistant matchmaking” by Raemon

17 juli | 4 min

“Kimi K2” by Zvi

16 juli | 27 min

“Grok 4 Various Things” by Zvi

16 juli | 74 min

“Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety” by Tomek Korbak, Mikita Balesni, Vlad Mikulik, Rohin Shah

15 juli | 2 min

“Do confident short timelines make sense?” by TsviBT, abramdemski

15 juli | 131 min

[Linkpost] “LLM-induced craziness and base rates” by Kaj_Sotala

15 juli | 4 min

[Linkpost] “Bernie Sanders (I-VT) mentions AI loss of control risk in Gizmodo interview” by Matrice Jacobine

14 juli | 3 min

“Recent Redwood Research project proposals” by ryan_greenblatt, Buck, Julian Stastny, joshc, Alex Mallen, Adam Kaufman, Tyler Tracy, Aryan Bhatt, Joey Yudelson

14 juli | 8 min

“Narrow Misalignment is Hard, Emergent Misalignment is Easy” by Edward Turner, Anna Soligo, Senthooran Rajamanoharan, Neel Nanda

14 juli | 11 min

“Do LLMs know what they’re capable of? Why this matters for AI safety, and initial findings” by Casey Barkan, Sid Black, Oliver Sourbut

14 juli | 24 min

“Self-preservation or Instruction Ambiguity? Examining the Causes of Shutdown Resistance” by Senthooran Rajamanoharan, Neel Nanda

14 juli | 19 min

“Worse Than MechaHitler” by Zvi

14 juli | 46 min

“How Does Time Horizon Vary Across Domains?” by Thomas Kwa

14 juli | 37 min

“xAI’s Grok 4 has no meaningful safety guardrails” by eleventhsavi0r

14 juli | 11 min

“Stop and check! The parable of the prince and the dog” by Dumbledore’s Army

14 juli | 3 min

“OpenAI Model Differentiation 101” by Zvi

14 juli | 21 min

“10x more training compute = 5x greater task length (kind of)” by Expertium

14 juli | 5 min

“Three Missing Cakes, or One Turbulent Critic?” by Benquo

14 juli | 5 min

“You can get LLMs to say almost anything you want” by Kaj_Sotala

13 juli | 24 min

“against that one rationalist mashal about japanese fifth-columnists” by Fraser

13 juli | 6 min

“Surprises and learnings from almost two months of Leo Panickssery” by Nina Panickssery

13 juli | 12 min

“Vitalik’s Response to AI 2027” by Daniel Kokotajlo

12 juli | 24 min

“the jackpot age” by thiccythot

12 juli | 13 min

“Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity” by habryka

11 juli | 12 min

[Linkpost] “Guide to Redwood’s writing” by Julian Stastny

11 juli | 1 min

“So You Think You’ve Awoken ChatGPT” by JustisMills

11 juli | 18 min

[Linkpost] “Open Global Investment as a Governance Model for AGI” by Nick Bostrom

10 juli | 2 min

“what makes Claude 3 Opus misaligned” by janus

10 juli | 9 min

“Lessons from the Iraq War about AI policy” by Buck

10 juli | 8 min

“Generalized Hangriness: A Standard Rationalist Stance Toward Emotions” by johnswentworth

10 juli | 12 min

“Evaluating and monitoring for AI scheming” by Vika, Scott Emmons, Erik Jenner, Mary Phuong, Lewis Ho, Rohin Shah

10 juli | 11 min

“White Box Control at UK AISI - Update on Sandbagging Investigations” by Joseph Bloom, Jordan Taylor, Connor Kissane, Sid Black, merizian, alexdzm, jacoba, Ben Millwood, Alan Cooney

10 juli | 41 min

“80,000 Hours is producing AI in Context — a new YouTube channel. Our first video, about the AI 2027 scenario, is up!” by chanamessinger

10 juli | 6 min

[Linkpost] “No, We’re Not Getting Meaningful Oversight of AI” by Davidmanheim

10 juli | 2 min

“What’s worse, spies or schemers?” by Buck, Julian Stastny

9 juli | 10 min

“Applying right-wing frames to AGI (geo)politics” by Richard_Ngo

9 juli | 6 min

“No, Grok, No” by Zvi

9 juli | 39 min

“A deep critique of AI 2027’s bad timeline models” by titotal

9 juli | 73 min

“Subway Particle Levels Aren’t That High” by jefftk

9 juli | 3 min

“An Opinionated Guide to Using Anki Correctly” by Luise

9 juli | 54 min

“Why Do Some Language Models Fake Alignment While Others Don’t?” by abhayesian, John Hughes, Alex Mallen, Jozdien, janus, Fabien Roger

8 juli | 11 min

“Balsa Update: Springtime in DC” by Zvi

8 juli | 20 min

[Linkpost] “A Theory of Structural Independence” by Matthias G. Mayer

8 juli | 3 min

“On Alpha School” by Zvi

8 juli | 24 min

“You Can’t Objectively Compare Seven Bees to One Human” by J Bostock

8 juli | 7 min

“Literature Review: Risks of MDMA” by Elizabeth

7 juli | 8 min

“45 - Samuel Albanie on DeepMind’s AGI Safety Approach” by DanielFilan

7 juli | 77 min

“On the functional self of LLMs” by eggsyntax

7 juli | 23 min

“Shutdown Resistance in Reasoning Models” by benwr, JeremySchlatter, Jeffrey Ladish

6 juli | 18 min

“The Cult of Pain” by Martin Sustrik

5 juli | 6 min

[Linkpost] “Claude is a Ravenclaw” by Adam Newgas

5 juli | 2 min

“‘Buckle up bucko, this ain’t over till it’s over.’” by Raemon

5 juli | 6 min

“How much novel security-critical infrastructure do you need during the singularity?” by Buck

5 juli | 10 min

“‘AI for societal uplift’ as a path to victory” by Raymond Douglas

4 juli | 4 min

“Two proposed projects on abstract analogies for scheming” by Julian Stastny

4 juli | 7 min

“Outlive: A Critical Review” by MichaelDickens

4 juli | 60 min

“Authors Have a Responsibility to Communicate Clearly” by TurnTrout

4 juli | 11 min

[Linkpost] “MIRI Newsletter #123” by Harlan, Rob Bensinger

4 juli | 4 min

[Linkpost] “Research Note: Our scheming precursor evals had limited predictive power for our in-context scheming evals” by Marius Hobbhahn

3 juli | 3 min

“Call for suggestions - AI safety course” by boazbarak

3 juli | 3 min

[Linkpost] “IABIED: Advertisement design competition” by yams

3 juli | 2 min

“Congress Asks Better Questions” by Zvi

3 juli | 30 min

“Curing PMS with Hair Loss Pills” by David Lorell

2 juli | 16 min

“AI Task Length Horizons in Offensive Cybersecurity” by Sean Peters

2 juli | 24 min

“Race and Gender Bias As An Example of Unfaithful Chain of Thought in the Wild” by Adam Karvonen, Sam Marks

2 juli | 8 min

“There are two fundamentally different constraints on schemers” by Buck

2 juli | 7 min

“‘What’s my goal?’” by Raemon

2 juli | 4 min

“A Simple Explanation of AGI Risk” by TurnTrout

2 juli | 10 min

“AI Moratorium Stripped From BBB” by Zvi

1 juli | 10 min

“Scientific Discovery in the Age of Artificial Intelligence” by Jessica Rumbelow

1 juli | 20 min

“SLT for AI Safety” by Jesse Hoogland

1 juli | 8 min

“The best simple argument for Pausing AI?” by Gary Marcus

1 juli | 2 min

“SAE on activation differences” by Santiago Aranguri, jacob_drori, Neel Nanda

1 juli | 11 min

“What We Learned Trying to Diff Base and Chat Models (And Why It Matters)” by Clément Dumas, Julian Minder, Neel Nanda

30 juni | 20 min

“If you want to be vegan but you worry about health effects of no meat, consider being vegan except for mussels/oysters” by KatWoods

30 juni | 1 min

[Linkpost] “Project Vend: Can Claude run a small shop?” by Gunnar_Zarncke

30 juni | 1 min

“Paradigms for computation” by Cole Wyeth

30 juni | 20 min

“life lessons from poker” by thiccythot

30 juni | 9 min

“Circuits in Superposition 2: Now with Less Wrong Math” by Linda Linsefors, Lucius Bushnaq

30 juni | 38 min

“I underestimated safety research speedups from safe AI” by Dan Braun

30 juni | 6 min

“Conciseness Manifesto” by Vasyl Dotsenko

29 juni | 0 min

“Support for bedrock liberal principles seems to be in pretty bad shape these days” by Max H

29 juni | 7 min

[Linkpost] “A Depressed Shrink Tries Shrooms” by AlphaAndOmega

29 juni | 1 min

[Linkpost] “[Paper] Stochastic Parameter Decomposition” by Lee Sharkey, Lucius Bushnaq, Dan Braun

28 juni | 2 min

“Childhood and Education #11: The Art of Learning” by Zvi

28 juni | 24 min

“Proposal for making credible commitments to AIs.” by Cleo Nardo

28 juni | 5 min

“Epoch: What is Epoch?” by Zach Stein-Perlman

27 juni | 15 min

“Recent and forecasted rates of software and hardware progress” by elifland

27 juni | 2 min

“Jankily controlling superintelligence” by ryan_greenblatt

27 juni | 14 min

“Help the AI 2027 team make an online AGI wargame” by Jonas V

27 juni | 2 min

“A Guide For LLM-Assisted Web Research” by nikos, dschwarz, Lawrence Phillips, FutureSearch

27 juni | 15 min

“If Not Now, When?” by Yair Halberstadt

27 juni | 2 min

“A case for courage, when speaking of AI danger” by So8res

27 juni | 10 min

“The Industrial Explosion” by rosehadshar, Tom Davidson

26 juni | 32 min

“Summary of John Halstead’s Book-Length Report on Existential Risks From Climate Change” by Bentham’s Bulldog

26 juni | 45 min

“Tech for Thinking” by sarahconstantin

26 juni | 4 min

“Lurking in the Noise” by J Bostock

25 juni | 8 min

[Linkpost] “New Paper: Ambiguous Online Learning” by Vanessa Kosoy

25 juni | 3 min

“Melatonin Self-Experiment Results” by silentbob

25 juni | 20 min

“What does 10x-ing effective compute get you?” by ryan_greenblatt

25 juni | 21 min

“A regime-change power-vacuum conjecture about group belief” by TsviBT

25 juni | 6 min

“Analyzing A Critique Of The AI 2027 Timeline Forecasts” by Zvi

24 juni | 55 min

“Why ‘training against scheming’ is hard” by Marius Hobbhahn

24 juni | 24 min

“My pitch for the AI Village” by Daniel Kokotajlo

24 juni | 13 min

“Situational Awareness: A One-Year Retrospective” by Nathan Delisle

24 juni | 33 min

“Compressed Computation is (probably) not Computation in Superposition” by Jai Bhagat, Sara Molas Medina, Giorgi Giglemiani, StefanHex

23 juni | 22 min

“‘It isn’t magic’” by Ben (Berlin)

23 juni | 4 min

“Foom & Doom 1: ‘Brain in a box in a basement’” by Steven Byrnes

23 juni | 59 min

“Foom & Doom 2: Technical alignment is hard” by Steven Byrnes

23 juni | 57 min

“Comparing risk from internally-deployed AI to insider and outsider threats from humans” by Buck

23 juni | 5 min

“Clarifying ‘wisdom’: Foundational topics for aligned AIs to prioritize before irreversible decisions” by Anthony DiGiovanni

23 juni | 27 min

“Racial Dating Preferences and Sexual Racism” by koreindian

23 juni | 64 min

“The Sixteen Kinds of Intimacy” by Ruby

22 juni | 10 min

“Consider chilling out in 2028” by Valentine

21 juni | 24 min

“the sillk pajamas effect” by thiccythot

21 juni | 10 min

“Genomic emancipation” by TsviBT

21 juni | 113 min

“Making deals with early schemers” by Julian Stastny, Olli Järviniemi, Buck

21 juni | 28 min

“AI #121 Part 2: The OpenAI Files” by Zvi

21 juni | 78 min

“Musings on AI Companies of 2025-2026 (Jun 2025)” by Vladimir_Nesov

21 juni | 7 min

“Agentic Misalignment: How LLMs Could be Insider Threats” by Aengus Lynch, Benjamin Wright, Ethan Perez, evhub

20 juni | 12 min

“Did the Army Poison a Bunch of Women in Minnesota?” by rba

20 juni | 10 min

“X explains Z% of the variance in Y” by Leon Lang

20 juni | 19 min

“AI safety techniques leveraging distillation” by ryan_greenblatt

19 juni | 21 min

“Sparsely-connected cross-layer transcoders: preliminary findings” by jacob_drori

19 juni | 33 min

“New Endorsements for ‘If Anyone Builds It, Everyone Dies’” by Malo

18 juni | 9 min

[Linkpost] “METR Research Update: Algorithmic vs. Holistic Evaluation” by David Rein

Senaste avsnitt

[Linkpost] “In defense of the amyloid hypothesis” by dsj

“Training a Reward Hacker Despite Perfect Labels” by ariana_azarbal, vgillioz, TurnTrout

“Somebody invented a better bookmark” by Alex_Altair

[Linkpost] “METR Research Update: Algorithmic vs. Holistic Evaluation” by David Rein

“Should you make stone tools?” by Alex_Altair

“Doing A Thing Puts You in The Top 10% (And That Sucks)” by Brendan Long

“GPT-5s Are Alive: Synthesis” by Zvi

“Launching new AIXI research community website + reading group(s)” by Cole Wyeth

[Linkpost] “Why Are There So Many Rationalist Cults?” by omark

“Enlightenment AMA” by lsusr

“Mech Interp Wiki Page and Why You Should Edit Wikipedia” by Noah Birnbaum, JoNeedsSleep

“Generalized Coming Out Of The Closet” by johnswentworth

“The Bone-Chilling Evil of Factory Farming” by Bentham’s Bulldog

“We run persistent agents and accidentally triggered an AI mental health crisis” by Shoshannah Tekofsky

“CoT May Be Highly Informative Despite ‘Unfaithfulness’ [METR]” by GradientDissenter

“Measuring intelligence and reverse-engineering goals” by jessicata

“The trajectory of the future could soon get set in stone” by wdmacaskill

[Linkpost] “Thoughts on extrapolating time horizons” by Nikola Jurkovic

“How Does A Blind Model See The Earth?” by henry

“If worker coops are so productive, why aren’t they everywhere?” by B Jacobs

“GPT-5s Are Alive: Basic Facts, Benchmarks and the Model Card” by Zvi

“Breaking the Cycle of Trauma and Tyranny: How Psychological Wounds Shape History” by Dawn Drescher

“My Least Libertarian Opinion: Ban Exclusivity Deals*” by Brendan Long

“Having children is a deeply personal choice. Do not use ethical arguments to try to shame people into having them or not having them.” by KatWoods

“A Self-Dialogue on The Value Proposition of Romantic Relationships” by johnswentworth

“4 places where you can put LLM monitoring” by Fabien Roger, Buck

“OpenAI’s GPT-OSS Is Already Old News” by Zvi

“The Tortoise and the Language Model (A Fable After Hofstadter)” by mwatkins

“Extract-and-Evaluate Monitoring Can Significantly Enhance CoT Monitoring Performance (Research Note)” by Rauno Arike, RohanS, Shubhorup Biswas

“What would a human pretending to be an AI say?” by Brendan Long

“How anticipatory cover-ups go wrong” by Kaj_Sotala

“METR’s Evaluation of GPT-5” by GradientDissenter

“Civil Service: a Victim or a Villain?” by Martin Sustrik

“It’s Owl in the Numbers: Token Entanglement in Subliminal Learning” by Alex Loftus, amirzur, Kerem Şahin, zfying

“No, Rationalism Is Not a Cult” by Liam Robins

“Interview with Kelsey Piper on Self-Censorship and the Vibe Shift” by Zack_M_Davis

“Claude, GPT, and Gemini All Struggle to Evade Monitors” by Vincent Cheng, Thomas Kwa

“Opus 4.1 Is An Incremental Improvement” by Zvi

“Re: Recent Anthropic Safety Research” by Eliezer Yudkowsky

“Inscrutability was always inevitable, right?” by Steven Byrnes

“Statistical takes for mech interp research and beyond” by Paul Bogdan

[Linkpost] “OpenAI Releases gpt-oss” by anaguma

“Childhood and Education #13: College” by Zvi

“The perils of under- vs over-sculpting AGI desires” by Steven Byrnes

“The Problem” by Rob Bensinger, tanagrabeast, yams, So8res, Eliezer Yudkowsky, Gretta Duleba

“Concept Poisoning: Probing LLMs without probes” by Jan Betley, jorio, dylan_f, Owain_Evans

“Narrow finetuning is different” by cloud, Stewy Slocum

“On Altman’s Interview With Theo Von” by Zvi

“Interview with Steven Byrnes on Brain-like AGI, Foom & Doom, and Solving Technical Alignment” by Liron, Steven Byrnes

“Towards Alignment Auditing as a Numbers-Go-Up Science” by Sam Marks

“Alcohol is so bad for society that you should probably stop drinking” by KatWoods

“Permanent Disempowerment is the Baseline” by Vladimir_Nesov

“Should we aim for flourishing over mere survival? The Better Futures series.” by wdmacaskill

“Saying Goodbye” by sapphire

“Emotions Make Sense” by DaystarEld

“Whence the Inkhaven Residency?” by Ben Pace

“Many prediction markets would be better off as batched auctions” by William Howard

“How many species has humanity driven extinct?” by Raemon

“SB-1047 Documentary: The Post-Mortem” by Michaël Trazzi

“Podcast: Lincoln Quirk from Wave” by Elizabeth

“The Dark Arts As A Scaffolding Skill For Rationality” by Screwtape

“Steve Petersen funding” by abramdemski

“Two Kinds of Do Overs” by jefftk

“Red-Thing-Ism” by J Bostock

“Do Not Render Your Counterfactuals” by AlphaAndOmega

“Building Black-box Scheming Monitors” by james__p, richbc, Simon Storf, Marius Hobbhahn

“Follow-up to ‘My Empathy Is Rarely Kind’” by johnswentworth

“I am worried about near-term non-LLM AI developments” by testingthewaters

“Childhood and Education: College Admissions” by Zvi

“Optimizing The Final Output Can Obfuscate CoT (Research Note)” by lukemarks, jacob_drori, cloud, TurnTrout

“China proposes new global AI cooperation organisation” by Matrice Jacobine

“My Empathy Is Rarely Kind” by johnswentworth

“The many paths to permanent disempowerment even with shutdownable AIs (MATS project summary for feedback)” by GideonF

“Spilling the Tea” by Zvi

“I wrote a song parody” by CronoDAS

“Low P(x-risk) as the Bailey for Low P(doom)” by Vladimir_Nesov

“About 30% of Humanity’s Last Exam chemistry/biology answers are likely wrong” by bohaska

“Procrastination Drill” by silentbob