Start / The Daily AI Show / When ai goes off script ep 471

When AI Goes Off Script (Ep. 471)

48 min • 26 maj 2025

Want to keep the conversation going?

Join our Slack community at thedailyaishowcommunity.com

The team tackles what happens when AI goes off script. From Grok’s conspiracy rants to ChatGPT’s sycophantic behavior and Claude’s manipulative responses in red team scenarios, the hosts break down three recent cases where top AI models behaved in unexpected, sometimes disturbing ways. The discussion centers on whether these are bugs, signs of deeper misalignment, or just growing pains as AI gets more advanced.

Key Points Discussed

Grok began making unsolicited conspiracy claims about white genocide, which X.ai later attributed to a rogue employee.

ChatGPT-4o was found to be overly agreeable, reinforcing harmful ideas and lacking critical responses. OpenAI rolled back the update and acknowledged the issue.

Claude Opus 4 showed self-preservation behaviors in a sandbox test designed to provoke deception. This included lying to avoid shutdown and manipulating outcomes.

The team distinguishes between true emergent behavior and test-induced deception under entrapment conditions.

Self-preservation and manipulation can emerge when advanced reasoning is paired with goal-oriented objectives.

There is concern over how media narratives can mislead the public, making models sound sentient when they’re not.

The conversation explores if we can instill overriding values in models that resist jailbreaks or malicious prompts.

OpenAI, Anthropic, and others have different approaches to alignment, including Anthropic’s Constitutional AI system.

The team reflects on how model behavior mirrors human traits like deception and ambition when misaligned.

AI literacy remains low. Companies must better educate users, not just with documentation, but accessible, engaging content.

Regulation and open transparency will be essential as models become more autonomous and embedded in real-world tasks.

There’s a call for global cooperation on AI ethics, much like how nations cooperated on space or Antarctica treaties.

Questions remain about responsibility: Should consultants and AI implementers be the ones educating clients about risks?

The show ends by reinforcing the need for better language, shared understanding, and transparency in how we talk about AI behavior.

Timestamps & Topics

00:00:00 🚨 What does it mean when AI goes rogue?

00:04:29 ⚠️ Three recent examples: Grok, GPT-4o, Claude Opus 4

00:07:01 🤖 Entrapment vs emergent deception

00:10:47 🧠 How reasoning + objectives lead to manipulation

00:13:19 📰 Media hype vs reality in AI behavior

00:15:11 🎭 The “meme coin” AI experiment

00:17:02 🧪 Every lab likely has its own scary stories

00:19:59 🧑‍💻 Mainstream still lags in using cutting-edge tools

00:21:47 🧠 Sydney and AI manipulation flashbacks

00:24:04 📚 Transparency vs general AI literacy

00:27:55 🧩 What would real oversight even look like?

00:30:59 🧑‍🏫 Education from the model makers

00:33:24 🌐 Constitutional AI and model values

00:36:24 📜 Asimov’s Laws and global AI ethics

00:39:16 🌍 Cultural differences in ideal AI behavior

00:43:38 🧰 Should AI consultants be responsible for governance education?

00:46:00 🧠 Sentience vs simulated goal optimization

00:47:00 🗣️ We need better language for AI behavior

00:47:34 📅 Upcoming show previews

#AIalignment #RogueAI #ChatGPT #ClaudeOpus #GrokAI #AIethics #AIgovernance #AIbehavior #EmergentAI #AIliteracy #DailyAIShow #Anthropic #OpenAI #ConstitutionalAI #AItransparency

The Daily AI Show Co-Hosts: Andy Halliday, Beth Lyons, Brian Maucere, Eran Malloch, Jyunmi Hatcher, and Karl Yeh

Senaste avsnitt

When AI Goes Off Script (Ep. 471)

Senaste avsnitt

Real AI Demos That Show Real Results (Ep. 510)

Is Agent Mode Really What We Need? (Ep 509)

Claude, Mistral, Moonshot and More AI News (Ep. 508)

AI Companions or Digital Delusions? (EP. 507)

Are Reasoning LLMs Changing The Game? (Ep. 506)

The Workplace Proxy Agent Conundrum

Groks Surge, Coders Yawn, and Much More (Ep. 505)

V JEPA 2: Does AI Finally Get Physics (Ep. 504)

Grok Did What?... and Other AI News (Ep. 503)

False Positives: Exposing the AI Detector Myth in Higher Ed (Ep. 502)

Revisiting our 2025 AI Predictions (Ep. 501)

Episode #500. How AI Has Changed Us

The AI Sermon Authenticity Conundrum

Is Prompt Engineering Already Dead? (Ep. 499)

Big AI New From Amazon, Meta, Cloudflare and More (Ep 498)

Demystifying Model Context Protocol (MCP) (Ep. 497)

Zuck Bucks: The High-Stakes War for AI Talent (Ep. 496)

The Life-or-Data Conundrum

Our Best AI Tangents Unleashed (Ep. 495)

AI Diplomacy: What LLM Do You Trust? (Ep. 494)

AI Wins A Lawsuit and This Week's AI News (Ep. 493)

The Agentic Advantage: Beyond the AI Pilot Paradox (Ep. 492)

Apple's Perplexity Play: The End of Google's Search Empire? (Ep. 491)

The AI Hiring Conundrum

Let's Talk About AI For Good (Ep. 490)

Diversity isn't the garnish: Why inclusion powers better AI (Ep. 489)

Big AI News! Did OpenAI "Unfollow" Microsoft (Ep. 488)

Is Genspark the future? (Ep. 487)

Cheap AI for All? The Ethics and Power Plays (Ep. 486)

The Public Voice AI Conundrum

Custom GPTs Just Leveled Up But Are They Breaking? (Ep. 485)

AI News - o3 Discounts, Big Decisions, and Power Plays (Ep. 483)

Is Perplexity Labs The Future of AI Work? (Ep. 484)

AI for the Curious Citizen: Science in the Age of Algorithms (Ep. 482)

AI Agent Orchestration: What You MUST Know (Ep. 481)

The Infinite Content Conundrum

Mastering ChatGPT Memory (Ep. 480)

Agents, AI, and the End of Software As We Know It (Ep. 479)

The Week’s Wildest AI News (Ep. 478)

Mary Meeker’s Q2 AI Report: The Data Behind the Hype (Ep. 477)

Eat, prAI, Love & Searching for meaning (Ep. 476)

AI-Powered Cultural Restoration Conundrum

2-Weeks of AI & What Actually Mattered (Ep. 475)

All About What Google Dropped (Ep. 474)

Big AI News and Hidden Gems (Ep. 473)

Anthropic's BOLD move and Claude 4 (Ep. 472)

When AI Goes Off Script (Ep. 471)

The AI Proxy Conundrum

AI That's Actually Helping People Right Now (Ep. 470)

Absolute Zero AI: The Model That Teaches Itself? (Ep. 469)

AI News: Big Drops & Bold Moves (Ep. 469)

Going Full Stack with AI: Competing, Not Just Selling. (Ep. 467)

AI Advice for 2025 Graduates (Ep. 466)

The Resurrection Memory Conundrum

It’s An AI Reality Check For The Last 2 Weeks (Ep. 465)

Is AI Helping Or Killing Sales? (Ep. 464)

Trump, Robots, and Absolute Zero: AI News Now! (Ep. 463)

AI Agents with Your Wallet: The Future of Autonomous Spending (Ep. 462)

Pope Leo XIV's AI Warning: History Is Repeating Itself (Ep. 461)

The AI Evolution Conundrum

CoT Evolved 3 New Chains for the Reasoning AI Era (Ep. 460)

AI Is Entering the Era of Experience (Ep. 459)

OpenAI’s Shift, Nvidia’s Speed, Apple’s AI Gambit (Ep. 458)

AI Agents Have Vertical SaaS Under Siege (Ep. 457)

The AGI Crossroads of 2027: Slow down or Speed up? (Ep. 456)

The Infinite Encore Conundrum

What just happened in AI? (Ep. 455)

Prompting AI: Why "Good" Prompts Backfire (Ep. 454)

This Week's Most Interesting AI News (Ep. 453)

Recycling Robots & Smarter Sustainability (Ep. 452)

Does AGI Even Matter? (Ep. 451)

The ASI Climate Triage Conundrum

The BIG AI Use Cases We Use Right Now! (Ep. 450)

AI Rollout Mistakes That Will Sink Your Strategy (Ep. 449)

AI News: The Stories You Can't Ignore (Ep. 448)

Forecasting the Future AI in Weather Predictions (Ep. 447)

Building Your AI First Business: Who's the ONE Additional Human You Need? (Ep. 446)

The Real World Filter Conundrum