Data Economics: �Incentives and Privacy Considerations��Juba Ziani�Georgia Tech
Gathering, exchanging, and using data
Gathering, exchanging, and using data
Gathering, exchanging, and using data
How can we use our collective data more responsibly?
Responsible data: data transactions
Responsible data: data transactions
What if, instead of directly collecting and using (personal) data?
Responsible data: data transactions
Economic and optimization questions:
Responsible data: data privacy
Responsible data: data privacy
Talk outline
Disclaimer: will mostly talk about my own work…
… But hope it gives you an idea of type of questions/landscape
�
Talk outline
Talk outline
Optimal Data Acquisition For Statistical Estimation
Yiling Chen, Harvard
Nicole Immorlica, MSR (now Yale)
Brendan Lucier, MSR
Vasilis Syrkganis, MSR (now Stanford)
Juba Ziani, Caltech and MSR intern (now Georgia Tech)
EC 2018
Data interactions can be complicated…
…But let’s focus on one-sided data acquisition
Individuals
Analyst/platform
Private/sensitive
Private/sensitive data
The data acquisition problem
Individuals
Analyst/platform
Private/sensitive
I want to run a statistic/train a machine learning model on individuals’ data
Private/sensitive data
The data acquisition problem
Individuals
Private/sensitive
Private/sensitive data
Analyst/platform
The data acquisition problem
Individuals
Private/sensitive
Private/sensitive data
Privacy cost
Analyst/platform
Privacy Costs
Data
Analyst’s goal
Medical info from individuals with rare disease
Content history, �ratings, etc.
Run a study to:
Improve platform’s recommendations
The data acquisition problem
Individuals
Analyst/platform
Private/sensitive
Private/sensitive data
Privacy cost
The data acquisition problem
Individuals
Analyst/platform
Private/sensitive
Private/sensitive data
Privacy cost
Different value for privacy for different users
The data acquisition problem
Individuals
Analyst/platform
Private/sensitive
Private/sensitive data
Privacy cost
Give me a million dollars, your wife, your kids
The data acquisition problem
Individuals
Analyst/platform
Private/sensitive
Private/sensitive data
Privacy cost
Give me a million dollars, your wife, your kids
3 donuts for less privacy
The data acquisition problem
Individuals
Analyst/platform
Private/sensitive
Private/sensitive data
Privacy cost
Give me a million dollars, your wife, your kids
3 donuts for less privacy
I want to be famous on TikTok
Challenges are 2-fold
Optimization: a naïve solution
Low cost agents
High cost agents
. . . . . . . . . . . . .
Optimization: a naïve solution
Low cost agents
High cost agents
. . . . . . . . . . . . .
Optimization: a naïve solution
Low cost agents
High cost agents
. . . . . .
Slightly higher �cost agents
Optimization: a naïve solution
Low cost agents
High cost agents
. . . . . .
Slightly higher �cost agents
Data-cost correlation: an HIV example
Low privacy cost agents
High privacy cost agents
. . . . . . . . . . . . .
Our main result
Goal: optimize accuracy of population estimate
Constraint: limited budget, cannot buy all data points
�
Our main result
Goal: optimize accuracy of population estimate
Constraint: limited budget, cannot buy all data points
Not just an algorithmic solution. �Closed-form optimal solution!
�
Our main result
Agent cost
Allocation rule/prob. of buying from agent
Goal: optimize accuracy of population estimate
Constraint: limited budget, cannot buy all data points
�
B
Challenge: data-cost correlation
Low cost agents
High cost agents
. . . . . . . . . . . . .
More data
Less data
Challenge: data-cost correlation
Low cost agents
High cost agents
. . . . . . . . . . . . .
Horvitz-Thompson estimator
More data
Less data
Challenge: data-cost correlation
Low cost agents
High cost agents
. . . . . . . . . . . . .
More data
🡺
Lower weight
Less data
🡺
Higher weight
Horvitz-Thompson estimator
Challenge: data-cost correlation
Low cost agents
High cost agents
. . . . . . . . . . . . .
Horvitz-Thompson estimator
Reduction to min-max
Allocation rule: prob. of buying
Data
distribution
Reduction to min-max
Reduction to min-max
P2: adversary
P1: analyst
Zero-sum game
Reduction to min-max
P2: adversary
P1: analyst
Zero-sum game
The techniques: best response computation
A few remarks
“Prior-free Data Acquisition for Accurate Statistical Estimation”�by Y. Chen, S. Zheng – EC’19
Talk outline
Optimal Data Acquisition with Privacy-Aware Agents
Rachel Cummings, Columbia
Hadi Elzayn, Stanford
Vasilis Gkatzelis, Drexel
Manolis Pountorakis, Drexel
Juba Ziani, Georgia Tech
(Best Paper at SATML 2023)
The data acquisition problem
Individuals
Analyst/platform
Private/sensitive
Private/sensitive data
Privacy cost
Differential Privacy
How does the analyst protect privacy?
“Close” distribution of �outputs when changing �only one data entry
How does the analyst protect privacy?
Formal definition
Privacy-accuracy trade-offs, informally
Differential privacy on its own is easy to obtain! �
… don’t use the data and get perfect privacy
But we still want the data to be useful:
The data acquisition problem
Individuals
Analyst/platform
Private/sensitive
Private/sensitive data
Privacy cost
How can the analyst compensate agents?
Much of the previous work on data acquisition with privacy:
Our work:
How can the analyst compensate agents?
Much of the previous work on data acquisition with privacy:
Our work + follow-up of [Fallah et al. 2024]:
The model – Analyst side
Goal: compute population mean
The model – Analyst side
The model – Analyst side
Data reported by agent i
The model – Analyst side
Weight for agent i
Weights sum to 1
Analyst’s
design space
Amount of noise
to add for privacy
The model – Agent side
Goal: compute population mean
The model – Agent side
The model – Agent side
The model – Agent side
The model – Agent side
The optimization problem
The optimization problem
Optimize accuracy of estimator
The optimization problem
Participation constraint
The optimization problem
Unbiased model
Results
Can characterize optimal solution… �� … Quadratic program 🡺 KKT conditions are useful
Weight
Cost
Weight
Cost
Result holds in variant of agent model
Some insights into the solution
Intuition:
Incentive properties
What if the platform does not know agents’ privacy cost, �and agents can lie about costs?
Original model:
Variant:
Data – cost correlation
What if data and costs are correlated?
Challenge:
Some thoughts/directions:
Talk outline
Back to two-sided markets
Back to two-sided markets
[Agarwal, Dahleh, Sarkar, EC19] in 1 picture
Platform can share/re-sell data
Equilibria of Data Marketplaces with Privacy-Aware Sellers under Endogenous Privacy Costs
Diptangshu Sen, Jingyan Wang, Juba Ziani
�SATML 2025�
Correlation between agents
Too Much Data: Prices and Inefficiencies in Data Markets
Daron Acemoglu, Ali Makhdoumi, Azarakhsh Malekian, Asu Ozdaglar
�American Economic Journal 2022
Correlation between agents
The Privacy Paradox and Optimal Bias-Variance Trade-offs in Data Acquisition
Guocheng Liao, Yu Su, Jianwei Huang, Adam Wierman, Juba Ziani
�EC 21, Math of OR 23
Talk outline
Responsible data: fairness in decision-making
Major Opportunity for Fairness
Boran Otay Dabak, �shamelessly stolen from a Medium piece he wrote
Major Opportunity for Fairness
Boran Otay Dabak, �shamelessly stolen from a Medium piece he wrote
Some positive news
Instead of seeing data as fixed, taking data production incentives into account helps get rid of fairness-accuracy-cost trade-offs…
The Cost of Balanced Training-Data Production in an Online Data Market�Augustin Chaintreau, Roland Maio, Juba Ziani�TheWebConf 2025
Talk outline
So far, stylized models!
Do people really know their value for data or for privacy?
X = My data
Y =Data I am buying
My value = f(X,Y)
Do we really believe in marketplaces?
Why a data intermediary/what do they bring?
Are data markets really 2-sided?
This picture is oversimplified��
Are data markets really 2-sided?
Research vs practical concerns?
Practical concerns from data sellers do not quite seem to align with current research.��
Research: Differential Privacy
Real-life concerns
Lots of opportunities, and new research coming up!
Data Economics: �Incentives and Privacy Considerations��Juba Ziani�Georgia Tech