1 of 11

OpenDP Usability and DP Wizard

Chuck McCallum (cmccallum@g.harvard.edu)

  • Review recent work to improve the usability of OpenDP
  • What are the goals for DP Wizard?
  • Walk through
  • Feedback welcome!

2 of 11

Solid foundation

Rust core / Floating point safe algorithms / Modular architecture

What have we done recently to improve usability?

3 of 11

Context API

context = (

dp.Context.compositor(

data=dataframe,

privacy_unit=unit,

privacy_loss=loss,

split_evenly_over=4,

)

)

Polars

context.query().select(

pl.col("score")

.cast(int)

.fill_null(0)

.dp.sum(bounds=

(0, 80)

),

dp.len()

)

Plugins

dp.m.make_user_measurement(

input_domain=domain,

input_metric=metric,

output_measure=measure,

function=function,

privacy_map=privacy_map,

TO=type(constant),

)

Solid foundation

Rust core / Floating point safe algorithms / Modular architecture

What have we done recently to improve usability?

4 of 11

Context API

context = (

dp.Context.compositor(

data=dataframe,

privacy_unit=unit,

privacy_loss=loss,

split_evenly_over=4,

)

)

Polars

context.query().select(

pl.col("score")

.cast(int)

.fill_null(0)

.dp.sum(bounds=

(0, 80)

),

dp.len()

)

Plugins

dp.m.make_user_measurement(

input_domain=domain,

input_metric=metric,

output_measure=measure,

function=function,

privacy_map=privacy_map,

TO=type(constant),

)

Better Documentation

Getting Started / API / Theory / Contributing

DP Wizard

Local CSV → Preview → Results + Notebook

Solid foundation

Rust core / Floating point safe algorithms / Modular architecture

5 of 11

DP Wizard goals

  • Take what we’ve learned from DP Creator
    • Ease of installation is important
    • Researchers may balk at uploading private data
    • Prototype quickly and test assumptions
  • Use the latest features from OpenDP
    • Context API
    • Polars Dataframes
  • Try something new
    • Shiny for Python
    • Code/notebook generation with notebooks
  • Model best practices for DP!

6 of 11

$ pip install dp_wizard

$ dp-wizard --demo

Application runs locally,

typically with a private CSV you provide.

Providing a public CSV for comparison is also possible.

Specify your dataset first, and then the details of your analysis.

(Discourage users from tweaking contributions for better results!)

Along the way, show OpenDP code samples.

(Full notebook download at the end.)

7 of 11

Select a column, and then specify your analysis of that column.

By itself, just asking for epsilon isn’t great for users new to DP, but with the interactive visualization…

8 of 11

ε=0.5

ε=1.0

estimated rows: 1,000

estimated rows: 10,000

9 of 11

For now, we’re not trying to track a privacy budget across invocations: If you choose to rerun the analysis many times, or just look at the data, it’s not in the scope of this tool to try to stop you.

If you don’t want to even make a DP release, that is also possible.

10 of 11

Notebook (executed or unexecuted) captures all the steps in the analysis.

In this case, the data is noticeably different from the normal distribution used in the simulation.

11 of 11

We haven’t formally assessed the usability: If you try it out, please let us know!

Inside DP Wizard!

On Github!

At office hour!