1 of 97

Data Economics: �Incentives and Privacy Considerations��Juba Ziani�Georgia Tech

2 of 97

Gathering, exchanging, and using data

3 of 97

Gathering, exchanging, and using data

4 of 97

Gathering, exchanging, and using data

How can we use our collective data more responsibly?

5 of 97

Responsible data: data transactions

6 of 97

Responsible data: data transactions

What if, instead of directly collecting and using (personal) data?

    • Ask for consent; let individuals opt in/out of sharing data?
    • Compensate appropriately for data?

7 of 97

Responsible data: data transactions

Economic and optimization questions:

    • What/how much data to collect buy from what sources?
    • How to design data markets and intermediaries?

8 of 97

Responsible data: data privacy

9 of 97

Responsible data: data privacy

    • How to perform meaningful computation on sensitive data while preserving agents’ privacy?
    • Differential privacy!

10 of 97

Talk outline

Disclaimer: will mostly talk about my own work…

… But hope it gives you an idea of type of questions/landscape

11 of 97

Talk outline

  1. Economics of data with privacy as a hurdle

  • Differential privacy as a knob in data economies

  • Quick note on two-sided markets

  • What do we do next?�

12 of 97

Talk outline

  1. Economics of data with privacy as a hurdle

  • Differential privacy as a knob in data economies

  • Quick note on two-sided markets

  • What do we do next?�

13 of 97

Optimal Data Acquisition For Statistical Estimation

Yiling Chen, Harvard

Nicole Immorlica, MSR (now Yale)

Brendan Lucier, MSR

Vasilis Syrkganis, MSR (now Stanford)

Juba Ziani, Caltech and MSR intern (now Georgia Tech)

EC 2018

14 of 97

Data interactions can be complicated…

15 of 97

…But let’s focus on one-sided data acquisition

Individuals

Analyst/platform

Private/sensitive

Private/sensitive data

16 of 97

The data acquisition problem

Individuals

Analyst/platform

Private/sensitive

I want to run a statistic/train a machine learning model on individuals’ data

Private/sensitive data

17 of 97

The data acquisition problem

Individuals

Private/sensitive

Private/sensitive data

Analyst/platform

18 of 97

The data acquisition problem

Individuals

Private/sensitive

Private/sensitive data

Privacy cost

Analyst/platform

19 of 97

Privacy Costs

Data

Analyst’s goal

Medical info from individuals with rare disease

Content history, �ratings, etc.

Run a study to:

  • better understand rare disease
  • obtain better treatments

Improve platform’s recommendations

20 of 97

The data acquisition problem

Individuals

Analyst/platform

Private/sensitive

Private/sensitive data

  • Provide privacy
  • Compensate for remaining privacy losses

Privacy cost

21 of 97

The data acquisition problem

Individuals

Analyst/platform

Private/sensitive

Private/sensitive data

  • Provide privacy
  • Compensate for remaining privacy losses

Privacy cost

22 of 97

Different value for privacy for different users

23 of 97

The data acquisition problem

Individuals

Analyst/platform

Private/sensitive

Private/sensitive data

  • Provide privacy
  • Compensate for remaining privacy losses

Privacy cost

Give me a million dollars, your wife, your kids

24 of 97

The data acquisition problem

Individuals

Analyst/platform

Private/sensitive

Private/sensitive data

  • Provide privacy
  • Compensate for remaining privacy losses

Privacy cost

Give me a million dollars, your wife, your kids

3 donuts for less privacy

25 of 97

The data acquisition problem

Individuals

Analyst/platform

Private/sensitive

Private/sensitive data

  • Provide privacy
  • Compensate for remaining privacy losses

Privacy cost

Give me a million dollars, your wife, your kids

3 donuts for less privacy

I want to be famous on TikTok

26 of 97

Challenges are 2-fold

  1. Optimization challenge: optimize trade-off between
    • Quality/accuracy of final statistical model (more data 🡺 more accuracy)
    • Budget spent purchasing data (more data 🡺 more costly)

  • Mechanism Design challenge:
    • Strategic agents: do not participate if low payment
    • Strategic agents: can lie/overstate costs

27 of 97

Optimization: a naïve solution

Low cost agents

High cost agents

. . . . . . . . . . . . .

28 of 97

Optimization: a naïve solution

Low cost agents

High cost agents

. . . . . . . . . . . . .

29 of 97

Optimization: a naïve solution

Low cost agents

High cost agents

. . . . . .

Slightly higher �cost agents

30 of 97

Optimization: a naïve solution

Low cost agents

High cost agents

. . . . . .

Slightly higher �cost agents

31 of 97

Data-cost correlation: an HIV example

Low privacy cost agents

High privacy cost agents

. . . . . . . . . . . . .

32 of 97

Our main result

Goal: optimize accuracy of population estimate

Constraint: limited budget, cannot buy all data points

33 of 97

Our main result

Goal: optimize accuracy of population estimate

Constraint: limited budget, cannot buy all data points

Not just an algorithmic solution. �Closed-form optimal solution!

34 of 97

Our main result

Agent cost

Allocation rule/prob. of buying from agent

Goal: optimize accuracy of population estimate

Constraint: limited budget, cannot buy all data points

B

 

35 of 97

Challenge: data-cost correlation

Low cost agents

High cost agents

. . . . . . . . . . . . .

More data

Less data

36 of 97

Challenge: data-cost correlation

Low cost agents

High cost agents

. . . . . . . . . . . . .

Horvitz-Thompson estimator

More data

Less data

37 of 97

Challenge: data-cost correlation

Low cost agents

High cost agents

. . . . . . . . . . . . .

More data

🡺

Lower weight

Less data

🡺

Higher weight

Horvitz-Thompson estimator

  • Uses importance weighting

38 of 97

Challenge: data-cost correlation

Low cost agents

High cost agents

. . . . . . . . . . . . .

Horvitz-Thompson estimator

  • Uses importance weighting
  • Unique unbiased linear estimator

39 of 97

Reduction to min-max

  •  

Allocation rule: prob. of buying

Data

distribution

40 of 97

Reduction to min-max

  •  

41 of 97

Reduction to min-max

  •  

P2: adversary

P1: analyst

Zero-sum game

42 of 97

Reduction to min-max

  •  

P2: adversary

P1: analyst

Zero-sum game

43 of 97

The techniques: best response computation

  •  

 

44 of 97

A few remarks

  • Mechanism design?
    • Get truthfulness (agents report true costs) via the right payment rule
    • Payment rule described in [Myerson 1981]
    • Reduces to optimization problem of the form previously described

  • Relaxing assumptions?

“Prior-free Data Acquisition for Accurate Statistical Estimation”�by Y. Chen, S. Zheng – EC’19

    • Mechanism functions online; adapts to arrivals of data providers over time
    • Allow biased estimator + study bias-variance trade-off

45 of 97

Talk outline

  1. Economics of data with privacy as a hurdle

  • Differential privacy as a knob in data economies

  • Quick note on two-sided markets

  • What do we do next?�

46 of 97

Optimal Data Acquisition with Privacy-Aware Agents

Rachel Cummings, Columbia

Hadi Elzayn, Stanford

Vasilis Gkatzelis, Drexel

Manolis Pountorakis, Drexel

Juba Ziani, Georgia Tech

(Best Paper at SATML 2023)

47 of 97

The data acquisition problem

Individuals

Analyst/platform

Private/sensitive

Private/sensitive data

  • Provide privacy
  • Compensate for remaining privacy losses

Privacy cost

48 of 97

Differential Privacy

 

49 of 97

How does the analyst protect privacy?

 

“Close” distribution of �outputs when changing �only one data entry

50 of 97

How does the analyst protect privacy?

 

51 of 97

Formal definition

  •  

52 of 97

Privacy-accuracy trade-offs, informally

Differential privacy on its own is easy to obtain! �

… don’t use the data and get perfect privacy

But we still want the data to be useful:

    • Trade-off between privacy and accuracy
    • Lower epsilon 🡺 more noise 🡺 more privacy but less accuracy

53 of 97

The data acquisition problem

Individuals

Analyst/platform

Private/sensitive

Private/sensitive data

  • Provide privacy
  • Compensate for remaining privacy losses

Privacy cost

54 of 97

How can the analyst compensate agents?

Much of the previous work on data acquisition with privacy:

  • Agents do not care about outcome of analyst’s statistic/model
  • Analyst/Platform compensates agents only through payments

Our work:

  • Agents get some utility from the outcome of the statistic. Ex:
    • Better understanding of HIV benefits patients
    • Better recommendation system benefits Netflix users
  • Better/more accurate model 🡺 increase agent utility.

55 of 97

How can the analyst compensate agents?

Much of the previous work on data acquisition with privacy:

  • Agents do not care about outcome of analyst’s statistic/model
  • Analyst/Platform compensates agents only through payments

Our work + follow-up of [Fallah et al. 2024]:

  • Agents get some utility/benefit from the outcome of the statistic.
    • Treatment/study outcome may benefit participants in medical study
    • Better recommendation system benefits platform’s users
  • Better/more accurate model 🡺 increase agent utility.

56 of 97

The model – Analyst side

Goal: compute population mean

  1. n individuals decide whether to participate in the platform or study
  2. Observe the data of the participants
  3. Compute a noisy, weighted average of the participants’ data

57 of 97

The model – Analyst side

 

58 of 97

The model – Analyst side

 

Data reported by agent i

59 of 97

The model – Analyst side

 

Weight for agent i

Weights sum to 1

Analyst’s

design space

Amount of noise

to add for privacy

60 of 97

The model – Agent side

Goal: compute population mean

  1. n individuals decide whether to participate in the platform or study
  2. Observe the data of the participants (i.i.d.)
  3. Compute a noisy, weighted average of the participants’ data

61 of 97

The model – Agent side

 

62 of 97

The model – Agent side

 

 

63 of 97

The model – Agent side

 

64 of 97

The model – Agent side

 

65 of 97

The optimization problem

 

66 of 97

The optimization problem

 

Optimize accuracy of estimator

67 of 97

The optimization problem

 

Participation constraint

68 of 97

The optimization problem

 

Unbiased model

69 of 97

Results

 

70 of 97

 

71 of 97

 

Can characterize optimal solution… �� … Quadratic program 🡺 KKT conditions are useful

72 of 97

 

Weight

Cost

73 of 97

 

Weight

Cost

 

74 of 97

Result holds in variant of agent model

 

75 of 97

Some insights into the solution

  • Every agent is given a >0 weight.
  • This means optimal accuracy is obtained when we incentivize every agent to participate no matter what their privacy requirement is!

Intuition:

  • Self-reinforcing effect:
    • more participation 🡺 increased accuracy 🡺 more participation
    • more participation 🡺 less noise for privacy 🡺 increased accuracy� 🡺 more participation

76 of 97

Incentive properties

What if the platform does not know agents’ privacy cost, �and agents can lie about costs?

Original model:

  • See follow-up of [Fallah et al. 2024] for truthful mechanisms

Variant:

  • Incentives are well-aligned: both platform and users want best model.
  • Get truthfulness of costs for free:
    • Understating cost is bad for privacy
    • Overstating cost is bad for model utility

77 of 97

Data – cost correlation

What if data and costs are correlated?

Challenge:

  • Need differential privacy on the privacy costs
  • Known in the most general case to be an impossible problem in the general case (need ignore privacy preferences) [Nissim, Vadhan, Xiao]

Some thoughts/directions:

  • Additional structure: cost increases in data (monotonicity)
  • Relaxed DP definition: [Chaudhuri and Courtade 2025]

78 of 97

Talk outline

  1. Economics of data with privacy as a hurdle

  • Differential privacy as a knob in data economies

  • Quick note on two-sided markets

  • What do we do next?�

79 of 97

Back to two-sided markets

80 of 97

Back to two-sided markets

81 of 97

[Agarwal, Dahleh, Sarkar, EC19] in 1 picture

82 of 97

Platform can share/re-sell data

Equilibria of Data Marketplaces with Privacy-Aware Sellers under Endogenous Privacy Costs

Diptangshu Sen, Jingyan Wang, Juba Ziani

SATML 2025�

83 of 97

Correlation between agents

Too Much Data: Prices and Inefficiencies in Data Markets

Daron Acemoglu, Ali Makhdoumi, Azarakhsh Malekian, Asu Ozdaglar

American Economic Journal 2022

84 of 97

Correlation between agents

The Privacy Paradox and Optimal Bias-Variance Trade-offs in Data Acquisition

Guocheng Liao, Yu Su, Jianwei Huang, Adam Wierman, Juba Ziani

EC 21, Math of OR 23

85 of 97

Talk outline

  1. Economics of data with privacy as a hurdle

  • Differential privacy as a knob in data economies

  • Quick note on two-sided markets

  • What do we do next?

86 of 97

Responsible data: fairness in decision-making

87 of 97

Major Opportunity for Fairness

Boran Otay Dabak, �shamelessly stolen from a Medium piece he wrote

88 of 97

Major Opportunity for Fairness

Boran Otay Dabak, �shamelessly stolen from a Medium piece he wrote

    • Beyond techniques to correct data imbalance in given dataset (usual static view) …
    • … move on to reasoning about *incentives* to produce better datasets (non-static)

89 of 97

Some positive news

Instead of seeing data as fixed, taking data production incentives into account helps get rid of fairness-accuracy-cost trade-offs…

The Cost of Balanced Training-Data Production in an Online Data Market�Augustin Chaintreau, Roland Maio, Juba Ziani�TheWebConf 2025

90 of 97

Talk outline

  1. Economics of data with privacy as a hurdle

  • Differential privacy as a knob in data economies

  • Quick note on two-sided markets

  • What do we do next?

91 of 97

So far, stylized models!

Do people really know their value for data or for privacy?

X = My data

Y =Data I am buying

My value = f(X,Y)

  • Non-i.i.d.: complementarities
  • Need to know a lot about Y

92 of 97

Do we really believe in marketplaces?

Why a data intermediary/what do they bring?

93 of 97

Are data markets really 2-sided?

This picture is oversimplified��

94 of 97

Are data markets really 2-sided?

95 of 97

Research vs practical concerns?

Practical concerns from data sellers do not quite seem to align with current research.��

Research: Differential Privacy

Real-life concerns

96 of 97

Lots of opportunities, and new research coming up!

97 of 97

Data Economics: �Incentives and Privacy Considerations��Juba Ziani�Georgia Tech