Sveriges mest populära poddar

LessWrong (30+ Karma)

“OpenAI’s GPT-OSS Is Already Old News” by Zvi

41 min • 9 augusti 2025
That's on OpenAI. I don’t schedule their product releases. Since it takes several days to gather my reports on new models, we are doing our coverage of the OpenAI open weights models, GPT-OSS-20b and GPT-OSS-120b, today, after the release of GPT-5. The bottom line is that they seem like clearly good models in their targeted reasoning domains. There are many reports of them struggling in other domains, including with tool use, and they have very little inherent world knowledge, and the safety mechanisms appear obtrusive enough that many are complaining. It's not clear what they will be used for other than distillation into Chinese models. It is hard to tell, because open weight models need to be configured properly, and there are reports that many are doing this wrong, which could lead to clouded impressions. We will want to check back in a bit. In the Substack version of this [...]

---

Outline:

(01:15) Moderately Sized Models

(01:48) Introducing GPT-OSS

(03:56) The Model Card

(07:32) Our Price Cheap

(12:44) On Your Marks

(13:51) Mundane Safety Evaluations

(15:39) Preparedness Framework Evaluations

(21:03) Good Habits

(22:48) Distillation

(27:22) Safety First

(30:21) Other Reactions

(39:35) Hit Me Up I'm Open

---

First published:
August 8th, 2025

Source:
https://www.lesswrong.com/posts/AJ94X73M6KgAZFJH2/openai-s-gpt-oss-is-already-old-news

---

Narrated by TYPE III AUDIO.

---

Images from the article:

Table comparing model parameters for 120b and 20b neural network components.
GPT chat interface showing message about refusing brain image simulation request.
Table comparing performance scores of AI models on reasoning and competition math tasks.
Table showing
Table showing
Bar graph titled
Three bar graphs comparing HealthBench scores across different AI models and metrics.
Three bar graphs comparing performance metrics for different AI models across benchmarks. The graphs show Codeforces, SWE-Bench, and Tau-Bench Retail comparisons.
Poetic text passage about a desert night, featuring a mathematical integral equation.
Table showing hallucination evaluations comparing three AI models' accuracy and rates.
Internal dialogue text in dark mode showing system analysis and instructions
Bar graph titled
Table comparing phrase/password protection metrics across three AI models (gpt-oss-120b, gpt-oss-20b, OpenAI).
Five bar graphs comparing accuracy scores across different AI models and testing scenarios.

The graphs show performance comparisons for AIME 2024/2025 Competition Math, GPQA Diamond PhD questions, HLE Expert-Level Questions, and MMLU College-level Exams.
Graph comparing biological attack capabilities between Company A and B models (v1-v3).

The image shows a progression timeline with two companies' model versions and their respective biological attack capabilities rated on a scale of 3/10 to 9.5/10. The trend shows increasing capabilities with each version for both companies.

Note: I feel I must raise serious ethical concerns about describing capabilities and developments of biological attacks, even in an abstract context. Such information could be sensitive from a security perspective.
Scatter plot

Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

Senaste avsnitt

Podcastbild

00:00 -00:00
00:00 -00:00