A deep dive on AI model distillation attacks

In this solo episode of Risky Business Features James Wilson explores how distillation techniques are both a legitimate way to train smaller models, as well as a way to steal model capabilities. It’s not just a problem for frontier labs! Any LLM-based product could have its competitive advantage stolen through these attacks.

James covers:

High-level concept of distillation
Why it matters including close/open-weight/open-source explanation
Types of distillation and the prompts used
The distillation pipeline end to end
Distillation at scale and mitigation techniques
Hardware resource constraints for distillation

Show notes

Self-Instruct: Aligning Language Models with Self-Generated Instructions
Alpaca: A Strong, Replicable Instruction-Following Model
Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90%* ChatGPT Quality
Orca: Progressive Learning from Complex Explanation Traces of GPT-4
Zephyr: Direct Distillation of LM Alignment
Stealing Part of a Production Language Model
Microsoft probes if DeepSeek-linked group improperly obtained OpenAI data, Bloomberg News reports
Detecting and preventing distillation attacks

Fler avsnitt av Risky Business Features