Sveriges mest populära poddar
LessWrong (Curated & Popular)

[HUMAN VOICE] "Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training" by evhub et al

9 min20 januari 2024

This is a linkpost for https://arxiv.org/abs/2401.05566

Support ongoing human narrations of LessWrong's curated posts:
www.patreon.com/LWCurated

Source:
https://www.lesswrong.com/posts/ZAsJv7xijKTfZkMtr/sleeper-agents-training- deceptive-llms-that-persist-through

Narrated for LessWrong by Perrin Walker.

Share feedback on this narration.

[Curated Post]
[
125+ Karma Post]

Fler avsnitt av LessWrong (Curated & Popular)

Visa alla avsnitt av LessWrong (Curated & Popular)

LessWrong (Curated & Popular) med LessWrong finns tillgänglig på flera plattformar. Informationen på denna sida kommer från offentliga podd-flöden.