Using ODI tools �for anonymisation threat modelling
Olivier Thereaux & Fionntán O’Donnell
theODI.org
ODI theory of change
We are one of many organisations working towards a good balance between encouraging and restricting how data is collected and used.
2
theODI.org
ODI Vision
We want people, organisations and communities to use data to make better decisions, and be protected from any harmful impacts.
That means...
Increasing �Access to data
4
theODI.org
...while retaining�Trust
5
theODI.org
Practical advocacy tools
Place your image over the grey box and crop accordingly
6
theODI.org
Guides to Anonymisation
Place your image over the grey box and crop accordingly
7
theODI.org
UKAN’s 12 steps
1. �Describe the use case
2-4. �Map the �Data Ecosystem
5. �Map the �Legal Issues
6. �Engage �with Stakeholders
7. �Evaluate the �Data Situation
8-9 �Select + implement�the processes
10-12 �Maintain trust
8
theODI.org
For some… we had tools to use
1. �Describe the use case
Data spectrum !
Data Ethics Canvas !
9
theODI.org
For others… the tools were set
5. �Map the �Legal Issues
(D)PIA !
GDPR !
10
theODI.org
We focused on...
1. �Describe the use case
2-4. �Map the �Data Ecosystem
5. �Map the �Legal Issues
6. �Engage �with Stakeholders
7. �Evaluate the �Data Situation
8-9 �Select + implement�The processes
10-12 �Maintain trust
11
theODI.org
Data Ecosystem Mapping
Actors�Beneficiaries, intermediaries, stewards, regulators...
Flow�Data and value – tangible and intangible
Could this help us discover and evaluate threats?
Place your image over the grey box and crop accordingly
12
theODI.org
Data Ecosystem Mapping
A couple of hours later...
Place your image over the grey box and crop accordingly
13
theODI.org
https://kumu.io/j-robert/synae-ecosystem-map
Threat Model
Classes of threats
Re-identification?�
Membership attack?�
Additional information about known subject?
All very unlikely…
But that may not be a good thing.�
16
theODI.org
Three classes of actors
Insiders�Within the “secure” subset of the flow. �They have access to the (raw) data before release.
Privileged access�Outsiders, but have access to related data through e.g. specific data sharing agreements. The risk of linking is higher for those.
General public�Everyone else. No specific privileged access to data.
Place your image over the grey box and crop accordingly
17
theODI.org
Insiders
Within the “secure” subset of the flow. �
They have access to the (raw) data before release.
Place your image over the grey box and crop accordingly
18
theODI.org
Privileged access
Outsiders, but have access to related data through e.g. specific data sharing agreements.
The risk of linking is higher for those.
Place your image over the grey box and crop accordingly
19
theODI.org
General public
Everyone else. No specific privileged access to data.
Place your image over the grey box and crop accordingly
20
theODI.org
Three classes of actors
Insiders�Within the “secure” subset of the flow. �They have access to the (raw) data before release.
Privileged access�Outsiders, but have access to related data through e.g. specific data sharing agreements. The risk of linking is higher for those.
General public�Everyone else. No specific privileged access to data.
Place your image over the grey box and crop accordingly
21
theODI.org
Place your image over the grey box and crop accordingly
Threat Scenarios
Place your image over the grey box and crop accordingly
23
theODI.org
Classes of threats
False insights�Synthetic data is tricky - what if someone extracts wrong insights from it, derives flawed policies, or performs mistaken re-identification? What if someone tries to find themselves, and doesn’t?
Anonymisation process�What if the methodology is not solid enough? What if there is still easily re-identifiable info? Conversely, what if the utility of the synthetic data is too low?
Fear�… if not enough confidence in the process or not a good understanding of the tech, what if the synthetic data never gets released?
...
24
theODI.org
Utility-Risk
Tradeoff
If risk is still high...�Organisational/cultural barrier. No release.
But if risk is “too low”�That means utility is probably too low too.
… so what’s the point of synth data?
Is there much value in verisimilitude? Would much of the value be created with a detailed description of the data fields / a schema?
Discuss!
25
theODI.org
Useful exercise?
Data Ecosystem Map
good for the data practitioner to communicate ecosystem to other people. Grounded in reality, present.
Threat model
Good for others to help data practitioner explore threats. Based in hypothetical future.�
26
theODI.org
Thank you
Stay in touch
The @ODIHQ team will soon be releasing
Let us know if you would like to be notified when they are out, �or follow our tech team at @ODILabs.