OpenDP Usability and DP Wizard
Chuck McCallum (cmccallum@g.harvard.edu)
Solid foundation
Rust core / Floating point safe algorithms / Modular architecture
What have we done recently to improve usability?
Context API
context = (
dp.Context.compositor(
data=dataframe,
privacy_unit=unit,
privacy_loss=loss,
split_evenly_over=4,
)
)
Polars
context.query().select(
pl.col("score")
.cast(int)
.fill_null(0)
.dp.sum(bounds=
(0, 80)
),
dp.len()
)
Plugins
dp.m.make_user_measurement(
input_domain=domain,
input_metric=metric,
output_measure=measure,
function=function,
privacy_map=privacy_map,
TO=type(constant),
)
Solid foundation
Rust core / Floating point safe algorithms / Modular architecture
What have we done recently to improve usability?
Context API
context = (
dp.Context.compositor(
data=dataframe,
privacy_unit=unit,
privacy_loss=loss,
split_evenly_over=4,
)
)
Polars
context.query().select(
pl.col("score")
.cast(int)
.fill_null(0)
.dp.sum(bounds=
(0, 80)
),
dp.len()
)
Plugins
dp.m.make_user_measurement(
input_domain=domain,
input_metric=metric,
output_measure=measure,
function=function,
privacy_map=privacy_map,
TO=type(constant),
)
Better Documentation
Getting Started / API / Theory / Contributing
DP Wizard
Local CSV → Preview → Results + Notebook
Solid foundation
Rust core / Floating point safe algorithms / Modular architecture
DP Wizard goals
$ pip install dp_wizard
$ dp-wizard --demo
Application runs locally,
typically with a private CSV you provide.
Providing a public CSV for comparison is also possible.
Specify your dataset first, and then the details of your analysis.
(Discourage users from tweaking contributions for better results!)
Along the way, show OpenDP code samples.
(Full notebook download at the end.)
Select a column, and then specify your analysis of that column.
By itself, just asking for epsilon isn’t great for users new to DP, but with the interactive visualization…
ε=0.5
ε=1.0
estimated rows: 1,000
estimated rows: 10,000
For now, we’re not trying to track a privacy budget across invocations: If you choose to rerun the analysis many times, or just look at the data, it’s not in the scope of this tool to try to stop you.
If you don’t want to even make a DP release, that is also possible.
Notebook (executed or unexecuted) captures all the steps in the analysis.
In this case, the data is noticeably different from the normal distribution used in the simulation.
We haven’t formally assessed the usability: If you try it out, please let us know!
Inside DP Wizard!
On Github!
At office hour!