OpenAI’s o3 Model Scores Lower Than Expected on Independent Benchmark

Saturday, 27 June

Saturday, 27 June, 2026

OpenAI’s o3 Model Scores Lower Than Expected on Independent Benchmark

By Isha

OpenAI's o3 AI model, initially claimed to solve over 25% of problems on the challenging FrontierMath benchmark, has underperformed in independent evaluations. Epoch AI, the institute behind FrontierMath, reported that o3 achieved around 10% accuracy, significantly lower than OpenAI's earlier assertions. The discrepancy is attributed to differences in testing conditions, with OpenAI's internal assessments possibly utilizing more computational resources and different problem subsets.

Read full story at TechCrunch

Tags:openai benchmark FrontierMath

Download TechShots

IT Trends Move Fast. Stay Faster.

Android iOS

Share your insights

Create Content

Categories