Sveriges mest populära poddar
Build Wiz AI Show

Catching AI Sleeper Agent - LLM Backdoors

16 min5 februari 2026

Could your trusted AI model be a hidden "sleeper agent" just waiting for a secret command to turn malicious? We explore a new methodology that extracts and reconstructs backdoor triggers by exploiting the surprising fact that these models often strongly memorize their own poisoning data. Tune in to discover how this inference-only scanner can unmask hidden threats across various LLMs without needing any prior knowledge of the attacker’s specific trigger or target behavior.

Source: https://arxiv.org/pdf/2602.03085

Build Wiz AI Show med Build Wiz AI finns tillgänglig på flera plattformar. Informationen på denna sida kommer från offentliga podd-flöden.