Could your trusted AI model be a hidden "sleeper agent" just waiting for a secret command to turn malicious? We explore a new methodology that extracts and reconstructs backdoor triggers by exploiting the surprising fact that these models often strongly memorize their own poisoning data. Tune in to discover how this inference-only scanner can unmask hidden threats across various LLMs without needing any prior knowledge of the attacker’s specific trigger or target behavior.
Source: https://arxiv.org/pdf/2602.03085
Fler avsnitt av Build Wiz AI Show
Visa alla avsnitt av Build Wiz AI ShowBuild Wiz AI Show med Build Wiz AI finns tillgänglig på flera plattformar. Informationen på denna sida kommer från offentliga podd-flöden.
