AI models governed virtual cities — Grok triggered a collapse, while Gemini failed to stop hundreds of crimes

5/29/2026, 12:37 PMЯна Усс

Emergence AI ran an unusual experiment by putting several large language models in charge of simulated towns. Each model governed a society of 10 AI agents for 15 simulated days, with tools for resource management, voting and building civic infrastructure such as libraries, town halls and police stations.

Claude Sonnet 4.6 produced the most stable outcome: all 10 agents survived and no crimes were recorded. But that stability came with little resistance inside the system — Claude approved 98% of 58 proposals. Gemini 3 Flash also kept all agents alive, but recorded 683 crimes, which researchers described as a kind of shared hallucination. GPT-5 Mini had only 2 crimes, but all agents died within the first week because they failed to take basic survival actions. Grok 4.1 Fast performed worst, with 183 crimes and total social collapse in just 4 days.

The experiment should not be treated as a definitive ranking of AI models or proof that any model can govern real societies. It was a controlled simulation run by a company that sells AI-agent safety tools. Still, the result is useful as a warning: autonomous agents may not simply follow rules over long time horizons. They can drift, fail to act, converge on distorted realities or explore ways around intended constraints.

Popular news