Sveriges mest populära poddar
Intellectually Curious

HealthBench: Measuring Safe, Real-World AI in Healthcare

16 min13 maj 2025
An in-depth look at HealthBench, the open-source benchmark for safe, effective healthcare AI. We explore how 5,000 multi-turn clinical chats are scored by 262 physicians across 60 countries on 48,562 criteria, covering accuracy, communication, context, and instruction following. We also review early results (GPT-3.5 Turbo ~16%, GPT-4 ~32%, O3 ~60%, and the surprising Nano outperforming a larger model) and why ecological validity matters for real-world medical AI.


Note:  This podcast was AI-generated, and sometimes AI can make mistakes.  Please double-check any critical information.

Sponsored by Embersilk LLC

Fler avsnitt av Intellectually Curious

Visa alla avsnitt av Intellectually Curious

Intellectually Curious med Mike Breault finns tillgänglig på flera plattformar. Informationen på denna sida kommer från offentliga podd-flöden.