Audio note: this article contains 59 uses of latex notation, so the narration may be difficult to follow. There's a link to the original text in the episode description.
Summary: Both our (UK AISI's) debate safety case sketch and Anthropic's research agenda point at systematic human error as a weak point for debate. This post talks through how one might strengthen a debate protocol to partially mitigate this.
Not too many errors in unknown places
The complexity theory models of debate assume some expensive verifier machine <span>_M_</span> with access to a human oracle, such that
Typically, <span>_M_</span> is some recursive tree computation, where for simplicity we can think of human oracle queries as occurring at the leaves [...]
---
Outline:
(00:39) Not too many errors in unknown places
(04:01) A protocol that handles an _\\varepsilon_\-fraction of errors
(05:26) What distribution do we measure errors against?
(06:43) Cross-examination-like protocols
(08:27) Collaborate with us
---
First published:
May 14th, 2025
---
Narrated by TYPE III AUDIO.
En liten tjänst av I'm With Friends. Finns även på engelska.