Sveriges mest populära poddar
Future of Life Institute Podcast

Why AI Evaluation Science Can't Keep Up (with Carina Prunkl)

54 min17 april 2026

Carina Prunkl is a researcher at Inria. She joins the podcast to discuss how to assess the capabilities and risks of general-purpose AI. We examine why systems can solve hard coding and math problems yet still fail at simple tasks, why pre-deployment tests often miss real-world behavior, and how faster capability gains can increase misuse risks. The conversation also covers de-skilling, red teaming, layered safeguards, and warning signs that AIs might undermine oversight.

LINKS:

CHAPTERS:

(00:00) Episode Preview

(01:04) Introducing the report

(02:10) Jagged frontier capabilities

(05:29) Formal reasoning progress

(12:36) Risks and evaluation science

(19:00) Funding evaluation capacity

(24:03) Autonomy and de-skilling

(31:32) Authenticity and AI companions

(41:00) Defense in depth methods

(48:34) Loss of control risks

(53:16) Where to read report

PRODUCED BY:

https://aipodcast.ing

SOCIAL LINKS:

Website: https://podcast.futureoflife.org

Twitter (FLI): https://x.com/FLI_org

Twitter (Gus): https://x.com/gusdocker

LinkedIn: https://www.linkedin.com/company/future-of-life-institute/

YouTube: https://www.youtube.com/channel/UC-rCCy3FQ-GItDimSR9lhzw/

Apple: https://geo.itunes.apple.com/us/podcast/id1170991978

Spotify: https://open.spotify.com/show/2Op1WO3gwVwCrYHg4eoGyP


Fler avsnitt av Future of Life Institute Podcast

Visa alla avsnitt av Future of Life Institute Podcast

Future of Life Institute Podcast med Future of Life Institute finns tillgänglig på flera plattformar. Informationen på denna sida kommer från offentliga podd-flöden.