AI Agents Score Half as Well as PhDs on Real Work. Benchmarks Say Otherwise. Both Are Right.
Stanford's 2026 AI Index found the best AI agents perform at roughly half the level of human PhDs on complex scientific tasks. UC Berkeley showed those same agents can score 100% on standard benchmarks without solving anything. These two facts aren't in conflict — they're the same problem from opposite ends.
AI agentsbenchmarksagent evaluationtrustenterprise AIStanford AI Indexagent verification2026