CrowdLLM: A Synthetic Crowd Simulator for Crowdsourcing with LLM Workers
Augmented with Lightweight Generative Model
Feng (Ryan) Lin1, Hanming Zheng2, Keyu Tian2, Congjing Zhang1, Li Zeng2, Shuai Huang1
1University of Washington 2City University of Hong Kong
2
Collaborators
Dr. Shuai Huang
Ryan Lin
Congjing Zhang
Keyu Tian
Hanming Zheng
Dr. Li Zeng
Agenda
3
Outline
4
Crowd-based Decision-making
5
Crowdsourcing
Voting
Recommender System
2
Digital Population
6
Real Population
Decision-making
Can LLMs build a realistic Digital Population?
Real humans are costly, hard to recruit, …
Limitation of LLM-based Population
7
[1] Yang, Joshua C., et al. "LLm Voting: Human choices and ai collective decision-making." Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society. 2024.
[2] Y. Gao, D. Lee, G. Burtch, & S. Fazelpour, “Take caution in using LLMs as human surrogates”, PNAS, 2025.
Heuristic, lack clear analytical formulation
Human
GPT
Llama
Voting
Outline
8
2
Our Proposed CrowdLLM
9
LLM
Real Population
Digital Population
Blender
Decision-making
Generative Model
Profile
Individual Belief
Reference
Decision
2
Architecture of CrowdLLM
10
2
Architecture of CrowdLLM
11
2
Architecture of CrowdLLM
12
1. Reference Generation
13
Task problem
Prompt
Pretrained LLM
Decisions
Response
2. Profile Generator
14
We generate realistic profiles of the digital population by adopting the state-of-the-art generative models.
Easy-to-sample
distribution
Target population
15
Personalized
Bias Distribution
3. Belief Generator
Age: 25 years old
Gender: Male
Occupation: Student
Marriage: Single
Semi-Implicit VAE
Encoder
Decoder
Sample
Latent Space
4. Blender
16
LLM
CrowdLLM
Training of CrowdLLM
17
Reconstruction Loss
Decision-making loss
Individual Belief
LLM Backbone
Profiles
Blender
Outline
18
Experimental Studies
19
Real-World Case Studies
Simulation Studies
Ablation Study
Data efficiency
Cost effectiveness
Voting
Amazon Product Reviews
Crowdsourcing
Belief Diversity
Case I: Voting
20
State-of-the-art
Our CrowdLLM
Case II: Amazon Product Reviews
21
Beauty (448 products, 10957 users)
Music (629 products, 12396 users)
Case II: Amazon Product Reviews
22
Case II: Amazon Product Reviews
23
Real Human
CrowdLLM
LLMs
Case III: Crowdsourcing
24
Experiments on multiple datasets:
Experimental Studies
25
Real-World Case Studies
Simulation Studies
Ablation Study
Data efficiency
Cost effectiveness
Voting
Amazon Product Reviews
Crowdsourcing
Belief Diversity
Data Efficiency of CrowdLLM
26
CrowdLLM shows great superiority over other LLM-based methods with only 1% training data (human data).
Outline
27
Conclusion
28