1
OpenAI o3
OpenAI
- SWE-bench Verified
- 69.1
- SWE-bench Pro
- —
- SWE-bench Live
- —
- Terminal-Bench 2.1
- 37.1
- SWE-Marathon
- —
- FeatureBench Full
- —
- LiveCodeBench v6
- 80.8
- Aider Polyglot Benchmark
- 81.3
Flagship leaderboard
Programming and agentic coding benchmarks for frontier language models.
Updated 368d ago · Rankings show governed observations with source provenance.
1 | OpenAI o3 OpenAI | 69.1 | — | — | 37.1 | — | — | 80.8 | 81.3 |
OpenAI