How AI Safety Testing Works: Leonard Tang on Red Teaming ChatGPT and Claude

Leonard Tang, founder and CEO of Haize Labs, joins Thinking on Paper to explain how AI models are tested for safety, reliability and predictable behaviour.

Systems such as ChatGPT and Claude can perform well under standard evaluations while still failing in unexpected ways when deployed in healthcare, finance, education or other high-stakes environments. Haize Labs develops testing and evaluation tools designed to identify those vulnerabilities before AI systems reach users.

In this episode, we discuss:

How AI safety testing and red teaming work
What Haize Labs means by an AI robustness and safety layer
How researchers uncover hidden failure modes in language models
Why benchmark performance doesn’t guarantee real-world reliability
How adversarial testing exposes weaknesses before deployment
Why different industries need specific AI safety standards
How organisations can evaluate whether an AI model is suitable for a particular use case
The difference between Silicon Valley’s AI claims and adoption in established industries
How language models could affect communication across cultures
What it means to align an AI system with human needs

Leonard explains why AI safety can’t be reduced to a single test or universal code of conduct. A model used for medical advice faces different risks from one used in education, financial services or customer support.

This conversation examines how developers and organisations can test AI systems more rigorously, identify failures before deployment and build models that behave more reliably in real-world conditions.

TIMESTAMPS

(00:00) - Disruptors and Curious Minds(01:07) - Our Sponsor: Conviction (01:50) - Introducing Leonard Tang: AI CEO and Founder(03:37) - The Importance of AI Safety: What’s at Stake in AI Development?(06:21) - Using Mathematics and Modeling to Understand Human Behaviour in AI(08:12) - Why Are Technologists So Often Musicians?(11:06) - Language, Culture, and AI(17:05) - Common Misconceptions About AI: What People Get Wrong(19:20) - The Dartmouth Conference: Birth of AI and Its Lasting Impact(19:55) - Claude and ChatGPT Pre-training: What Do The Models Go Through?(25:20) - An Alan Watts AI Model for Enhanced Understanding(28:33) - Claude vs ChatGPT: Comparing AI Models and Performance(31:44) - AI Jailbreak Detection(33:25) - How Dreamlike Images Enhance AI Safety and Trustworthiness(38:20) - Top-Down vs Bottom-Up AI Development: Approaches to Building Safer AI(42:55) - Protecting Artists, Intellectual Property, and Art in the Age of AI(48:20) - Developing an AI Code of Conduct for Ethical AI Usage(49:45) - A Message for Veteran AI Stars(52:35) - Restructuring Education for Critical Thinking in the Age of AI(54:16) - Book Club Live--

Quotes from the show:"We need to rigorously test AI models to discover all their vulnerabilities, failure modes, and gotchas before they get deployed in production.""AI is a technology of language, and inevitably, it will empower us to merge cultures.""We’re trying to get AI to be a little more mature, a little more sophisticated, and just more reliable.""What we’re interested in is enforcing an AI code of conduct for specific applications, making AI systems tightly aligned with the needs of their use cases.""People in legacy industries are underestimating AI’s potential, while Silicon Valley is often overhyping it."--🔗 More:Visit Haize Labs: ⁠⁠⁠https://haizelabs.com/⁠⁠⁠Visit Thinking On Paper: ⁠⁠⁠https://www.thinkingonpaper.xyz/⁠⁠⁠

⁠⁠⁠⁠⁠⁠Instagram⁠⁠⁠⁠⁠⁠

⁠⁠⁠⁠⁠⁠Twitter⁠⁠⁠⁠⁠⁠

Fler avsnitt av Technology, Connected