"Machinic Psychopharmacology: Do LLMs Self-Medicate?" by Sid Black, Joseph Bloom

Sid Black, Joseph Bloom

UK AISI, Model Transparency Team

Epistemic status: Most experiments were run over a period of ~2-3 days during a hackathon at UK AISI, and were fairly heavily vibe coded. Expect some of this to be rough around the edges.

tl;dr

We give two language models (Qwen3-8B and Qwen3-32B) access to “self-steering” tools: a suite of 40 steering vectors as tools they can call to manipulate their own internal states. We make these tools available to the model in various settings: a free-play task, an introspection task, and a maths capabilities task, and observe their behaviour in each.

To our knowledge, this is the first work that gives LLMs tool-mediated control over their own internal states.

Figure 1: Overview of the experimental setup. The library of 40 steering vectors (top), and the three settings in which we observe the models' behaviour (bottom).

We aim to investigate a few high level research questions:

RQ1: Which vectors do the models prefer?
RQ2: How well can the models introspect on what's happening to them? Can they guess which steering vector is being applied?
RQ3: Will the models reach for vectors whilst doing an actual task? If yes: do [...]

---

Outline:

(00:33) tl;dr

[... 24 more sections]

---

First published:
June 10th, 2026

Source:
https://www.lesswrong.com/posts/cNDJuXNZ8MrkPZNzj/machinic-psychopharmacology-do-llms-self-medicate-3

---

Narrated by TYPE III AUDIO.

---

Images from the article:

Fler avsnitt av LessWrong (Curated & Popular)