This is a link post. Summary: We propose measuring AI performance in terms of the length of tasks AI agents can complete. We show that this metric has been consistently exponentially increasing over the past 6 years, with a doubling time of around 7 months. Extrapolating this trend predicts that, in under a decade, we will see AI agents that can independently complete a large fraction of software tasks that currently take humans days or weeks.
Full paper | Github repo
---
First published:
March 19th, 2025
Source:
https://www.lesswrong.com/posts/deesrjitvXM4xYGZd/metr-measuring-ai-ability-to-complete-long-tasks
Linkpost URL:
https://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/
---
Narrated by TYPE III AUDIO.
---
Full paper | Github repo
---
First published:
March 19th, 2025
Source:
https://www.lesswrong.com/posts/deesrjitvXM4xYGZd/metr-measuring-ai-ability-to-complete-long-tasks
Linkpost URL:
https://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/
---
Narrated by TYPE III AUDIO.
---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
Fler avsnitt av LessWrong (Curated & Popular)
Visa alla avsnitt av LessWrong (Curated & Popular)LessWrong (Curated & Popular) med LessWrong finns tillgänglig på flera plattformar. Informationen på denna sida kommer från offentliga podd-flöden.
