Sveriges mest populära poddar

Intellectually Curious

Vetenskap Teknologi

HealthBench: Measuring Safe, Real-World AI in Healthcare

16 min•13 maj 2025

An in-depth look at HealthBench, the open-source benchmark for safe, effective healthcare AI. We explore how 5,000 multi-turn clinical chats are scored by 262 physicians across 60 countries on 48,562 criteria, covering accuracy, communication, context, and instruction following. We also review early results (GPT-3.5 Turbo ~16%, GPT-4 ~32%, O3 ~60%, and the surprising Nano outperforming a larger model) and why ecological validity matters for real-world medical AI.

Note: This podcast was AI-generated, and sometimes AI can make mistakes. Please double-check any critical information.

Sponsored by Embersilk LLC

Fler avsnitt av Intellectually Curious

Move 37 and the AI Creativity Revolution

20 apr.•6 min

Claude Design and the Speed of AI UI

19 apr.•6 min

The Hutter Prize Challenge

18 apr.•5 min

GPT Rosalind: AI Architecting the Future of Drug Discovery

17 apr.•6 min

Literal Logic to Autonomous Co-Workers: Claude Opus 4.7

16 apr.•6 min

Google DeepMind Gemini ER 1.6 AI for Real-World Robotics

16 apr.•6 min

Automating Work with Claude Code Routines

14 apr.•5 min

Autonomous AI Agents in Research: Codex, Claude Code, and the Future of the Workflow

13 apr.•5 min

SkillClaw: Collective Skill Evolution for Multi-User Agent Ecosystems

13 apr.•6 min

Claude Code Ultraplan Moves Terminal Work to the Cloud

11 apr.•5 min

Intellectually Curious med Mike Breault finns tillgänglig på flera plattformar. Informationen på denna sida kommer från offentliga podd-flöden.