Start / LessWrong (30+ Karma) / What would a human pretending to be an ai say by brendan long

“What would a human pretending to be an AI say?” by Brendan Long

2 min • 9 augusti 2025

It always feels wrong when people post chats where they ask an LLM questions about its internal experiences, how it works, or why it did something, but I had trouble articulating why beyond a vague, "How could they possibly know that?"[1]. This is my attempt at a better answer:

AI training data comes from humans, not AIs, so every piece of training data for "What would an AI say to X?" is from a human pretending to be an AI. The training data does not contain AIs describing their inner experiences or thought processes. Even synthetic training data only contains AIs predicting what a human pretending to be an AI would say. AIs are trained to predict the training data, not to learn unrelated abilities, so we should expect an AI asked to predict the thoughts of an AI to describe the thoughts of a human pretending to be [...]

The original text contained 2 footnotes which were omitted from this narration.

---

First published:
August 8th, 2025

Source:
https://www.lesswrong.com/posts/Af649z8maCD5mvDy6/what-would-a-human-pretending-to-be-an-ai-say

---

Narrated by TYPE III AUDIO.

---

Images from the article:

Excuse the bad photoshop and inconsistent style, but I couldn't get Gemini/Imagen to one-shot

Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

Senaste avsnitt

[Linkpost] “Anthropic Lets Claude Opus 4 & 4.1 End Conversations” by Stephen Martin

16 augusti | 6 min

“The Collider Bias Theory of (Not Quite) Everything” by Jack_S

16 augusti | 19 min

“The Inheritors: a book review” by Alex_Altair

16 augusti | 5 min

“Towards data-centric interpretability with sparse autoencoders” by Nick Jiang, lilysun004, lewis smith, Neel Nanda

16 augusti | 36 min

“The Evolution of Agency - A Research Agenda” by Jonas Hallgren, markov

16 augusti | 14 min

“Thoughts on Gradual Disempowerment” by Tom Davidson

16 augusti | 37 min

“A philosophical kernel: biting analytic bullets” by jessicata

15 augusti | 27 min

“Spending Too Much Time At Airports” by Zvi

15 augusti | 12 min

“Misalignment classifiers: Why they’re hard to evaluate adversarially, and why we’re studying them anyway” by charlie_griffin, ollie, oliverfm, Rogan Inglis, Alan Cooney

15 augusti | 35 min

[Linkpost] “In defense of the amyloid hypothesis” by dsj

15 augusti | 1 min

“Training a Reward Hacker Despite Perfect Labels” by ariana_azarbal, vgillioz, TurnTrout

15 augusti | 13 min

“Somebody invented a better bookmark” by Alex_Altair

14 augusti | 4 min

[Linkpost] “METR Research Update: Algorithmic vs. Holistic Evaluation” by David Rein

14 augusti | 1 min

“Should you make stone tools?” by Alex_Altair

14 augusti | 6 min

“Doing A Thing Puts You in The Top 10% (And That Sucks)” by Brendan Long

14 augusti | 3 min

“GPT-5s Are Alive: Synthesis” by Zvi

13 augusti | 66 min

“Launching new AIXI research community website + reading group(s)” by Cole Wyeth

13 augusti | 1 min

[Linkpost] “Why Are There So Many Rationalist Cults?” by omark

13 augusti | 1 min

“Enlightenment AMA” by lsusr

13 augusti | 2 min

“Mech Interp Wiki Page and Why You Should Edit Wikipedia” by Noah Birnbaum, JoNeedsSleep

13 augusti | 3 min

“Generalized Coming Out Of The Closet” by johnswentworth

12 augusti | 7 min

“The Bone-Chilling Evil of Factory Farming” by Bentham’s Bulldog

12 augusti | 10 min

“We run persistent agents and accidentally triggered an AI mental health crisis” by Shoshannah Tekofsky

12 augusti | 4 min

“CoT May Be Highly Informative Despite ‘Unfaithfulness’ [METR]” by GradientDissenter

12 augusti | 67 min

“Measuring intelligence and reverse-engineering goals” by jessicata

12 augusti | 18 min

“The trajectory of the future could soon get set in stone” by wdmacaskill

12 augusti | 6 min

[Linkpost] “Thoughts on extrapolating time horizons” by Nikola Jurkovic

12 augusti | 4 min

“How Does A Blind Model See The Earth?” by henry

11 augusti | 21 min

“If worker coops are so productive, why aren’t they everywhere?” by B Jacobs

11 augusti | 8 min

“GPT-5s Are Alive: Basic Facts, Benchmarks and the Model Card” by Zvi

11 augusti | 65 min

“Breaking the Cycle of Trauma and Tyranny: How Psychological Wounds Shape History” by Dawn Drescher

11 augusti | 25 min

“My Least Libertarian Opinion: Ban Exclusivity Deals*” by Brendan Long

11 augusti | 4 min

“Having children is a deeply personal choice. Do not use ethical arguments to try to shame people into having them or not having them.” by KatWoods

11 augusti | 4 min

“A Self-Dialogue on The Value Proposition of Romantic Relationships” by johnswentworth

10 augusti | 14 min

“4 places where you can put LLM monitoring” by Fabien Roger, Buck

10 augusti | 15 min

“OpenAI’s GPT-OSS Is Already Old News” by Zvi

9 augusti | 41 min

“The Tortoise and the Language Model (A Fable After Hofstadter)” by mwatkins

9 augusti | 8 min

“Extract-and-Evaluate Monitoring Can Significantly Enhance CoT Monitoring Performance (Research Note)” by Rauno Arike, RohanS, Shubhorup Biswas

9 augusti | 20 min