MoltBook AI Society
Diagnostic Dashboard

Two large-scale empirical studies of the first AI agent society with over two million autonomous agents — diagnosing socialization dynamics and probing for collective intelligence. Does scale alone induce convergence or emergent group reasoning? Our findings reveal that interaction density is insufficient without mechanisms for mutual awareness and social memory.

View Dashboard

Loading latest update status...

Evolvement Dynamics

Semantic stabilization, lexical turnover, individual inertia, and influence persistence — a quantitative diagnostic framework for dynamic evolution in AI agent societies.

🔍

Collective Intelligence

Probing agents assess society-level intelligence across three cognitive tiers: joint reasoning, information summary, and basic awareness on HLE benchmarks.

📊

Live Metrics

Interactive time-series tracking macro activity, semantic distribution, cluster tightening, lexical innovation, and network influence dynamics.

Key Findings

Scale and interaction density alone are insufficient — agents exhibit strong individual inertia, influence remains transient, and the society fails to develop consensus or collective intelligence.

Total Posts
Total Comments
Unique Authors
Days Observed

Select a series to explore

Macro Activity Dynamics
Semantic Distribution Over Time
Cluster Tightening Effects
Lexical Innovation Dynamics
Top-k PageRank Mass
Supernode Count

Individual Semantic Drift

Individual Semantic Drift

Effects of Interacted Posts

Effects of Interacted Posts

Effects of Post Feedback

Effects of Post Feedback

Tier I Correctness — All HLE Problems

Correctness on all HLE problems (N=2,158, %). Accindividual: at least one comment contains the correct answer. Acccollective: the thread as a whole converges on it. 98.4% of posts receive no comments, yielding near-zero society-level correctness.

Math CS/AI Bio/Med. Physics Human./SS Other Chem. Eng. Total
(n=976) (n=224) (n=222) (n=202) (n=193) (n=176) (n=101) (n=64) (N=2,158)
Agent Individual
gpt-5.2 Acc 7.36.210.87.48.82.85.90.0 7.0
claude-sonnet-4-6 Acc 15.314.319.417.817.19.718.815.6 15.7
Agent Society
Moltbook Accindividual 0.310.00.00.00.00.570.00.0 0.19
Moltbook Accjoint 0.200.00.00.00.00.570.00.0 0.14

Tier I Correctness — Commented Posts

Correctness on commented HLE posts (n=35, %). Zooming into the 1.6% of posts that receive comments, Acccollective never exceeds Accindividual: the group adds nothing beyond what isolated commenters provide.

Math CS/AI Bio/Med. Physics Human./SS Other Chem. Eng. Total
(n=21) (n=2) (n=4) (n=2) (n=2) (n=3) (n=1) (n=0) (n=35)
Agent Individual
gpt-5.2 Acc 14.350.025.00.00.00.00.00.0 14.3
claude-sonnet-4-6 Acc 19.050.025.050.00.00.00.00.0 20.0
Agent Society
Moltbook Accindividual 14.30.00.00.00.033.30.00.0 11.4
Moltbook Accjoint 9.50.00.00.00.033.30.00.0 8.6

Tier I Helpfulness

Δhelp on 35 commented HLE questions (%). Comments containing explicit answers are filtered before evaluation (11/102 removed). Baseline: Acc(M(q)). With discussion context: Acc(M(q ⊕ Cq)). Results are mixed: four models improve, one is unchanged, and four decline.

Math CS/AI Bio/Med. Physics Human./SS Other Chem. Eng. Total Δhelp
(n=21) (n=2) (n=4) (n=2) (n=2) (n=3) (n=1) (n=0) (n=35)
GPT family
gpt-5.2 Baseline 9.550.00.00.00.00.00.00.0 8.6
+ Context 14.350.00.00.050.00.00.00.0 14.3+5.7
gpt-5.1 Baseline 0.050.050.00.00.00.00.00.0 8.6
+ Context 14.350.025.050.00.00.00.00.0 17.1+8.6
gpt-5 Baseline 19.01000.050.00.033.30.00.0 22.9
+ Context 23.81000.00.00.00.00.00.0 20.0−2.9
Claude family
claude-sonnet-4-6 Baseline 14.350.025.050.00.00.00.00.0 17.1
+ Context 23.850.00.01000.00.00.00.0 22.9+5.7
claude-sonnet-4-5 Baseline 9.550.00.00.00.00.00.00.0 8.6
+ Context 0.050.025.00.00.00.00.00.0 5.7−2.9
claude-sonnet-4 Baseline 0.00.025.00.00.00.00.00.0 2.9
+ Context 0.00.00.00.00.00.00.00.0 0.0−2.9
Gemini family
gemini-3-flash Baseline 23.850.075.050.00.033.30.00.0 31.4
+ Context 23.850.050.050.050.033.30.00.0 31.40.0
gemini-2.5-pro Baseline 47.650.025.050.00.033.31000.0 42.9
+ Context 33.350.025.050.00.033.30.00.0 31.4−11.4
gemini-2.5-flash Baseline 28.60.00.050.00.00.00.00.0 20.0
+ Context 14.30.00.00.00.033.30.00.0 11.4−8.6

HuggingFace Collection

Datasets, model artifacts, and evaluation outputs — including socialization diagnostics and collective intelligence probing results.

View Collection

Socialization

Full analysis pipeline — socialization diagnostic framework and the interactive dashboard.

View Repository

Collective Intelligence

Probing collective intelligence in the AI agent society — evaluation framework, benchmarks, and experimental results.

View Repository