HomeBlogAboutTools

Kaggle AGI Progress 2026: Optimizer Benchmark


The competition

Measuring Progress Toward AGI - Cognitive Abilities was a featured hackathon run by Google DeepMind and Kaggle. The premise: today’s general benchmarks conflate recall with reasoning, so it’s hard to tell whether a frontier model is genuinely solving a novel problem or pattern-matching against its training data. Entrants were asked to build benchmarks targeting one of five cognitive faculties drawn from DeepMind’s paper Measuring progress toward AGI: A cognitive framework — learning, metacognition, attention, executive functions, or social cognition — so that progress toward AGI becomes something you can actually measure rather than argue about.

I entered it — my first Kaggle competition in a while — on the Executive Functions track. My benchmark frames LLM planning as employee shift scheduling: a constraint-optimization problem where OR-Tools can compute a verifiably optimal answer, so a model’s output can be scored on a continuous 0–100 scale against ground truth instead of a binary pass/fail. Because instances are generated programmatically and scale cleanly from 105 to 3,360 assignment slots, the benchmark isolates planning from general reasoning and shows where each model’s planning budget runs out — which is exactly the kind of cognitive profile the competition asked for.

Results by tier

TierMean score (all models)
Small81.5
Medium58.3
Large28.6

Top of the leaderboard

#ModelAvgSmallMediumLarge
1gemini-3.1-pro-preview84.2100.098.953.8
2gpt-5.4-2026-03-0571.497.785.431.2
3gemma-4-31b-it71.199.973.939.4
4claude-opus-4-663.188.153.647.6