Sid Black, Joseph Bloom
UK AISI, Model Transparency Team
Epistemic status: Most experiments were run over a period of ~2-3 days during a hackathon at UK AISI, and were fairly heavily vibe coded. Expect some of this to be rough around the edges.
tl;dr
We give two language models (Qwen3-8B and Qwen3-32B) access to “self-steering” tools: a suite of 40 steering vectors as tools they can call to manipulate their own internal states. We make these tools available to the model in various settings: a free-play task, an introspection task, and a maths capabilities task, and observe their behaviour in each.
To our knowledge, this is the first work that gives LLMs tool-mediated control over their own internal states.
Figure 1: Overview of the experimental setup. The library of 40 steering vectors (top), and the three settings in which we observe the models' behaviour (bottom).
We aim to investigate a few high level research questions:
Outline:
(00:33) tl;dr
[... 24 more sections]
---
First published:
June 10th, 2026
Source:
https://www.lesswrong.com/posts/cNDJuXNZ8MrkPZNzj/machinic-psychopharmacology-do-llms-self-medicate-3
---
Narrated by TYPE III AUDIO.
---
UK AISI, Model Transparency Team
Epistemic status: Most experiments were run over a period of ~2-3 days during a hackathon at UK AISI, and were fairly heavily vibe coded. Expect some of this to be rough around the edges.
tl;dr
We give two language models (Qwen3-8B and Qwen3-32B) access to “self-steering” tools: a suite of 40 steering vectors as tools they can call to manipulate their own internal states. We make these tools available to the model in various settings: a free-play task, an introspection task, and a maths capabilities task, and observe their behaviour in each.
To our knowledge, this is the first work that gives LLMs tool-mediated control over their own internal states.
Figure 1: Overview of the experimental setup. The library of 40 steering vectors (top), and the three settings in which we observe the models' behaviour (bottom).
We aim to investigate a few high level research questions:
- RQ1: Which vectors do the models prefer?
- RQ2: How well can the models introspect on what's happening to them? Can they guess which steering vector is being applied?
- RQ3: Will the models reach for vectors whilst doing an actual task? If yes: do [...]
Outline:
(00:33) tl;dr
[... 24 more sections]
---
First published:
June 10th, 2026
Source:
https://www.lesswrong.com/posts/cNDJuXNZ8MrkPZNzj/machinic-psychopharmacology-do-llms-self-medicate-3
---
Narrated by TYPE III AUDIO.
---
Images from the article:
Fler avsnitt av LessWrong (Curated & Popular)
Visa alla avsnitt av LessWrong (Curated & Popular)LessWrong (Curated & Popular) med LessWrong finns tillgänglig på flera plattformar. Informationen på denna sida kommer från offentliga podd-flöden.
