LessWrong (30+ Karma)

[Linkpost] “AI Gets IMO Gold Medal: via general-purpose RL, not via narrow, task specific methodology” by Mikhail Samin

4 min • 19 juli 2025
This is a link post.

I’m excited to share that our latest @OpenAI experimental reasoning LLM has achieved a longstanding grand challenge in AI: gold medal-level performance on the world's most prestigious math competition—the International Math Olympiad (IMO).

We evaluated our models on the 2025 IMO problems under the same rules as human contestants: two 4.5 hour exam sessions, no tools or internet, reading the official problem statements, and writing natural language proofs.

Why is this a big deal? First, IMO problems demand a new level of sustained creative thinking compared to past benchmarks. In reasoning time horizon, we’ve now progressed from GSM8K (~0.1 min for top humans) → MATH benchmark (~1 min) → AIME (~10 mins) → IMO (~100 mins).

Second, IMO submissions are hard-to-verify, multi-page proofs. Progress here calls for going beyond the RL paradigm of clear-cut, verifiable rewards. By doing so, we’ve obtained a model that can [...]

---

First published:
July 19th, 2025

Source:
https://www.lesswrong.com/posts/RcBqeJ8GHM2LygQK3/ai-gets-imo-gold-medal-via-general-purpose-rl-not-via-narrow

Linkpost URL:
https://x.com/alexwei_/status/1946477742855532918

---

Narrated by TYPE III AUDIO.

---

Images from the article:

https://github.com/aw31/openai-imo-2025-proofs/blob/main/problem_1.txt

Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

Senaste avsnitt

Podcastbild

00:00 -00:00
00:00 -00:00