Start / Practical AI / Ai in the shadows from hallucinations to blackmail

AI in the shadows: From hallucinations to blackmail

45 min • 7 juli 2025

In the first episode of an "AI in the shadows" theme, Chris and Daniel explore the increasing concerning world of agentic misalignment. Starting out with a reminder about hallucinations and reasoning models, they break down how today’s models only mimic reasoning, which can lead to serious ethical considerations. They unpack a fascinating (and slightly terrifying) new study from Anthropic, where agentic AI models were caught simulating blackmail, deception, and even sabotage — all in the name of goal completion and self-preservation.

Featuring:

Chris Benson – Website, LinkedIn, Bluesky, GitHub, X
Daniel Whitenack – Website, GitHub, X

Links:

Senaste avsnitt

AI in the shadows: From hallucinations to blackmail

Senaste avsnitt

Workforce dynamics in an AI-assisted world

Reimagining actuarial science with AI

Agentic AI for Drone & Robotic Swarming

AI in the shadows: From hallucinations to blackmail

Finding Nemotron

AI hot takes and debates: Autonomy

Behind-the-Scenes: VC Funding for AI Startups

AI-Automated Film Making

Federated learning in production (part 2)

Federated learning in production (part 1)

Emailing like a superhuman

Model Context Protocol Deep Dive

Seeing beyond the scan in neuroimaging

Open source AI to tackle your backlog

Orchestrating agents, APIs, and MCP servers

Software and hardware acceleration with Groq

AI-assisted coding with GitHub's COO

Optimizing for efficiency with IBM’s Granite

Build a workspace of AI agents

GenAI hot takes and bad use cases

Tool calling and agents

Deep-dive into DeepSeek

Video generation with realistic motion

Mozart to Megadeth at CHRP

Sidekick is an AI Shopify expert

Full-duplex, real-time dialogue with Kyutai

Clones, commerce & campaigns

scikit-learn & data science you own

Creating tested, reliable AI applications

AI is changing the cybersecurity threat landscape

The path towards trustworthy AI

Big data is dead, analytics is alive

Practical workflow orchestration

Towards high-quality (maybe synthetic) datasets

Understanding what's possible, doable & scalable

GraphRAG (beyond the hype)

Pausing to think about scikit-learn & OpenAI o1

Cybersecurity in the GenAI age

AI is more than GenAI

Metrics Driven Development

Threat modeling LLM apps

Only as good as the data

Gaudi processors & Intel's AI portfolio

Broccoli AI at its best 🥦

Hyperventilating over the Gartner AI Hype Cycle

The first real-time voice assistant

Vectoring in on Pinecone

Stanford's AI Index Report 2024

Apple Intelligence & Advanced RAG

The perplexities of information retrieval

Using edge models to find sensitive data

Rise of the AI PC & local LLMs

AI in the U.S. Congress

First impressions of GPT-4o

Full-stack approach for effective AI agents

Autonomous fighter jets?!

Private, open source chat UIs

Mamba & Jamba

Udio & the age of multi-modal AI

RAG continues to rise

Should kids still learn to code?

AI vs software devs

Prompting the future

Generating the future of art & entertainment

YOLOv9: Computer vision is alive and well

Representation Engineering (Activation Hacking)

Leading the charge on AI in National Security

Gemini vs OpenAI

Data synthesis for SOTA LLMs

Large Action Models (LAMs) & Rabbits 🐇

Collaboration & evaluation for LLM apps

Advent of GenAI Hackathon recap

AI predictions for 2024

Open source, on-disk vector search with LanceDB

The state of open source AI

Suspicion machines ⚙️

The OpenAI debacle (a retrospective)

Generating product imagery at Shopify