1 of 27

Using ODI tools �for anonymisation threat modelling

Olivier Thereaux & Fionntán O’Donnell

theODI.org

2 of 27

ODI theory of change

We are one of many organisations working towards a good balance between encouraging and restricting how data is collected and used.

theODI.org

3 of 27

ODI Vision

We want people, organisations and communities to use data to make better decisions, and be protected from any harmful impacts.

4 of 27

That means...

Increasing �Access to data

theODI.org

5 of 27

...while retaining�Trust

theODI.org

6 of 27

Practical advocacy tools

Place your image over the grey box and crop accordingly

theODI.org

7 of 27

Guides to Anonymisation

Place your image over the grey box and crop accordingly

theODI.org

8 of 27

UKAN’s 12 steps

1. �Describe the use case

2-4. �Map the �Data Ecosystem

5. �Map the �Legal Issues

6. �Engage �with Stakeholders

7. �Evaluate the �Data Situation

8-9 �Select + implement�the processes

10-12 �Maintain trust

theODI.org

9 of 27

For some… we had tools to use

1. �Describe the use case

Data spectrum !

Data Ethics Canvas !

theODI.org

10 of 27

For others… the tools were set

5. �Map the �Legal Issues

(D)PIA !

GDPR !

theODI.org

11 of 27

We focused on...

1. �Describe the use case

2-4. �Map the �Data Ecosystem

5. �Map the �Legal Issues

6. �Engage �with Stakeholders

7. �Evaluate the �Data Situation

8-9 �Select + implement�The processes

10-12 �Maintain trust

theODI.org

12 of 27

Data Ecosystem Mapping

Actors�Beneficiaries, intermediaries, stewards, regulators...

Flow�Data and value – tangible and intangible

Could this help us discover and evaluate threats?

Place your image over the grey box and crop accordingly

theODI.org

13 of 27

Data Ecosystem Mapping

A couple of hours later...

Place your image over the grey box and crop accordingly

theODI.org

14 of 27

https://kumu.io/j-robert/synae-ecosystem-map

15 of 27

Threat Model

16 of 27

Classes of threats

Re-identification?�

Membership attack?�

Additional information about known subject?

All very unlikely…

But that may not be a good thing.�

theODI.org

17 of 27

Three classes of actors

Insiders�Within the “secure” subset of the flow. �They have access to the (raw) data before release.

Privileged access�Outsiders, but have access to related data through e.g. specific data sharing agreements. The risk of linking is higher for those.

General public�Everyone else. No specific privileged access to data.

Place your image over the grey box and crop accordingly

theODI.org

18 of 27

Insiders

Within the “secure” subset of the flow. �

They have access to the (raw) data before release.

Place your image over the grey box and crop accordingly

theODI.org

19 of 27

Privileged access

Outsiders, but have access to related data through e.g. specific data sharing agreements.

The risk of linking is higher for those.

Place your image over the grey box and crop accordingly

theODI.org

20 of 27

General public

Everyone else. No specific privileged access to data.

Place your image over the grey box and crop accordingly

theODI.org

21 of 27

Three classes of actors

Insiders�Within the “secure” subset of the flow. �They have access to the (raw) data before release.

Privileged access�Outsiders, but have access to related data through e.g. specific data sharing agreements. The risk of linking is higher for those.

General public�Everyone else. No specific privileged access to data.

Place your image over the grey box and crop accordingly

theODI.org

22 of 27

Place your image over the grey box and crop accordingly

23 of 27

Threat Scenarios

Place your image over the grey box and crop accordingly

theODI.org

24 of 27

Classes of threats

False insights�Synthetic data is tricky - what if someone extracts wrong insights from it, derives flawed policies, or performs mistaken re-identification? What if someone tries to find themselves, and doesn’t?

Anonymisation process�What if the methodology is not solid enough? What if there is still easily re-identifiable info? Conversely, what if the utility of the synthetic data is too low?

Fear�… if not enough confidence in the process or not a good understanding of the tech, what if the synthetic data never gets released?

...

theODI.org

25 of 27

Utility-Risk

Tradeoff

If risk is still high...�Organisational/cultural barrier. No release.

But if risk is “too low”�That means utility is probably too low too.

… so what’s the point of synth data?

Is there much value in verisimilitude? Would much of the value be created with a detailed description of the data fields / a schema?

Discuss!

theODI.org

26 of 27

Useful exercise?

Data Ecosystem Map

good for the data practitioner to communicate ecosystem to other people. Grounded in reality, present.

Threat model

Good for others to help data practitioner explore threats. Based in hypothetical future.�

theODI.org

27 of 27

Thank you

Stay in touch

The @ODIHQ team will soon be releasing

a primer on anonymisation and open data
a step-by-step guide to the UKAN decision-making framework

Let us know if you would like to be notified when they are out, �or follow our tech team at @ODILabs.