Will's Blog: OpenAI releases GDPval, a benchmark to test AI performance on "economically valuable, real-world tasks", and says Claude Opus 4.1 was the best performing model (Maxwell Zeff/TechCrunch)

25 September 2025

OpenAI releases GDPval, a benchmark to test AI performance on "economically valuable, real-world tasks", and says Claude Opus 4.1 was the best performing model (Maxwell Zeff/TechCrunch)

Maxwell Zeff / TechCrunch:
OpenAI releases GDPval, a benchmark to test AI performance on “economically valuable, real-world tasks”, and says Claude Opus 4.1 was the best performing model — OpenAI released a new benchmark on Thursday that tests how its AI models perform compared to human professionals across a wide range of industries and jobs.

Posted from: this blog via Microsoft Power Automate.