On the benchmark, Anthropic’s Claude Opus 4.5 Agent solved 37.4% whereas OpenAI’s GPT-5.1 Agent scored 43.1% on the full data ...