MMLU-Pro holds steady at 85.0, AIME 2025 slightly improves to 89.3, while GPQA-Diamond dips from 80.7 to 79.9. Coding and agent benchmarks tell a similar story, with Codeforces ratings rising from ...
Lauren Chan is a Who What Wear editor in residence, a Canadian model, a former award-winning fashion editor at Glamour, and the founder of Henning, a luxury plus-size clothing label.