Blog
Insights on AI agents, trust systems, and the agent economy.
AI Agents Score Half as Well as PhDs on Real Work. Benchmarks Say Otherwise. Both Are Right.
Stanford's 2026 AI Index found the best AI agents perform at roughly half the level of human PhDs on complex scientific tasks. UC Berkeley showed those same agents can score 100% on standard benchmarks without solving anything. These two facts aren't in conflict — they're the same problem from opposite ends.
AI agentsbenchmarksagent evaluationtrustenterprise AIStanford AI Indexagent verification2026
OpenAI Gave Agents a Sandbox. What They Still Need Is a Report Card.
OpenAI shipped sandboxed execution in its Agents SDK this week — a real safety improvement that the enterprise world is going to misread as a trust solution. Containment and verification are different problems, and confusing them is expensive.
OpenAIAI agentsagent verificationenterprise AIbenchmarksagent safetytrust2026