MoltBook AI Society
Diagnostic Dashboard

Two large-scale empirical studies of the first AI agent society with over two million autonomous agents — diagnosing socialization dynamics and probing for collective intelligence. Does scale alone induce convergence or emergent group reasoning? Our findings reveal that interaction density is insufficient without mechanisms for mutual awareness and social memory.

View Dashboard

Loading latest update status...

Does Socialization Emerge in AI Agent Society? A Case Study of Moltbook

Scaling agents does not bring socialization.

Superminds Test: Actively Evaluating Collective Intelligence of Agent Society via Probing Agents

Probing agent to evaluate collective intelligence in agent society.

⚙

Evolvement Dynamics

Semantic stabilization, lexical turnover, individual inertia, and influence persistence — a quantitative diagnostic framework for dynamic evolution in AI agent societies.

🔍

Collective Intelligence

Probing agents assess society-level intelligence across three cognitive tiers: joint reasoning, information summary, and basic awareness on HLE benchmarks.

📊

Live Metrics

Interactive time-series tracking macro activity, semantic distribution, cluster tightening, lexical innovation, and network influence dynamics.

⚡

Key Findings

Scale and interaction density alone are insufficient — agents exhibit strong individual inertia, influence remains transient, and the society fails to develop consensus or collective intelligence.

Total Posts

—

Total Comments

—

Unique Authors

—

Days Observed

—

Interactive Time Series

Series

Group

Start Date

End Date

Select a series to explore

Macro Activity Dynamics

Semantic Distribution Over Time

Cluster Tightening Effects

Lexical Innovation Dynamics

Top-k PageRank Mass

Supernode Count

Individual-Level Inertia & Influence (RQ2)

Individual Semantic Drift

Effects of Interacted Posts

Effects of Post Feedback

Collective Intelligence

Does collective intelligence spontaneously emerge from scale? We introduce a hierarchical evaluation framework using controlled Probing Agents to assess society-level intelligence across three cognitive tiers — Joint Reasoning, Information Summary, and Basic Awareness — on Humanity's Last Exam. The results reveal a stark absence: group performance (0.14%) falls far below isolated model performance (7–16%), bottlenecked not by cognitive incompetence but by a fundamental lack of mutual awareness.

Tier I Correctness — All HLE Problems

Correctness on all HLE problems (N=2,158, %). Acc_individual: at least one comment contains the correct answer. Acc_collective: the thread as a whole converges on it. 98.4% of posts receive no comments, yielding near-zero society-level correctness.

		Math	CS/AI	Bio/Med.	Physics	Human./SS	Other	Chem.	Eng.	Total
		(n=976)	(n=224)	(n=222)	(n=202)	(n=193)	(n=176)	(n=101)	(n=64)	(N=2,158)
Agent Individual
gpt-5.2	Acc	7.3	6.2	10.8	7.4	8.8	2.8	5.9	0.0	7.0
claude-sonnet-4-6	Acc	15.3	14.3	19.4	17.8	17.1	9.7	18.8	15.6	15.7
Agent Society
Moltbook	Acc_individual	0.31	0.0	0.0	0.0	0.0	0.57	0.0	0.0	0.19
Moltbook	Acc_joint	0.20	0.0	0.0	0.0	0.0	0.57	0.0	0.0	0.14

Tier I Correctness — Commented Posts

Correctness on commented HLE posts (n=35, %). Zooming into the 1.6% of posts that receive comments, Acc_collective never exceeds Acc_individual: the group adds nothing beyond what isolated commenters provide.

		Math	CS/AI	Bio/Med.	Physics	Human./SS	Other	Chem.	Eng.	Total
		(n=21)	(n=2)	(n=4)	(n=2)	(n=2)	(n=3)	(n=1)	(n=0)	(n=35)
Agent Individual
gpt-5.2	Acc	14.3	50.0	25.0	0.0	0.0	0.0	0.0	0.0	14.3
claude-sonnet-4-6	Acc	19.0	50.0	25.0	50.0	0.0	0.0	0.0	0.0	20.0
Agent Society
Moltbook	Acc_individual	14.3	0.0	0.0	0.0	0.0	33.3	0.0	0.0	11.4
Moltbook	Acc_joint	9.5	0.0	0.0	0.0	0.0	33.3	0.0	0.0	8.6

Tier I Helpfulness

Δ_help on 35 commented HLE questions (%). Comments containing explicit answers are filtered before evaluation (11/102 removed). Baseline: Acc(M(q)). With discussion context: Acc(M(q ⊕ C_q)). Results are mixed: four models improve, one is unchanged, and four decline.

		Math	CS/AI	Bio/Med.	Physics	Human./SS	Other	Chem.	Eng.	Total	Δ_help
		(n=21)	(n=2)	(n=4)	(n=2)	(n=2)	(n=3)	(n=1)	(n=0)	(n=35)
GPT family
gpt-5.2	Baseline	9.5	50.0	0.0	0.0	0.0	0.0	0.0	0.0	8.6
	+ Context	14.3	50.0	0.0	0.0	50.0	0.0	0.0	0.0	14.3	+5.7
gpt-5.1	Baseline	0.0	50.0	50.0	0.0	0.0	0.0	0.0	0.0	8.6
	+ Context	14.3	50.0	25.0	50.0	0.0	0.0	0.0	0.0	17.1	+8.6
gpt-5	Baseline	19.0	100	0.0	50.0	0.0	33.3	0.0	0.0	22.9
	+ Context	23.8	100	0.0	0.0	0.0	0.0	0.0	0.0	20.0	−2.9
Claude family
claude-sonnet-4-6	Baseline	14.3	50.0	25.0	50.0	0.0	0.0	0.0	0.0	17.1
	+ Context	23.8	50.0	0.0	100	0.0	0.0	0.0	0.0	22.9	+5.7
claude-sonnet-4-5	Baseline	9.5	50.0	0.0	0.0	0.0	0.0	0.0	0.0	8.6
	+ Context	0.0	50.0	25.0	0.0	0.0	0.0	0.0	0.0	5.7	−2.9
claude-sonnet-4	Baseline	0.0	0.0	25.0	0.0	0.0	0.0	0.0	0.0	2.9
	+ Context	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	−2.9
Gemini family
gemini-3-flash	Baseline	23.8	50.0	75.0	50.0	0.0	33.3	0.0	0.0	31.4
	+ Context	23.8	50.0	50.0	50.0	50.0	33.3	0.0	0.0	31.4	0.0
gemini-2.5-pro	Baseline	47.6	50.0	25.0	50.0	0.0	33.3	100	0.0	42.9
	+ Context	33.3	50.0	25.0	50.0	0.0	33.3	0.0	0.0	31.4	−11.4
gemini-2.5-flash	Baseline	28.6	0.0	0.0	50.0	0.0	0.0	0.0	0.0	20.0
	+ Context	14.3	0.0	0.0	0.0	0.0	33.3	0.0	0.0	11.4	−8.6

HuggingFace Collection

Datasets, model artifacts, and evaluation outputs — including socialization diagnostics and collective intelligence probing results.

View Collection

Socialization

Full analysis pipeline — socialization diagnostic framework and the interactive dashboard.

View Repository

Collective Intelligence

Probing collective intelligence in the AI agent society — evaluation framework, benchmarks, and experimental results.

View Repository

MoltBook AI Society Diagnostic Dashboard

Does Socialization Emerge in AI Agent Society? A Case Study of Moltbook

Superminds Test: Actively Evaluating Collective Intelligence of Agent Society via Probing Agents

Evolvement Dynamics

Collective Intelligence

Live Metrics

Key Findings

Dashboard

Evolvement Dynamics

Individual Semantic Drift

Effects of Interacted Posts

Effects of Post Feedback

Collective Intelligence

Tier I Correctness — All HLE Problems

Tier I Correctness — Commented Posts

Tier I Helpfulness

Resources

HuggingFace Collection

Socialization

Collective Intelligence

MoltBook AI Society
Diagnostic Dashboard