Data privacy:�What could possibly go wrong?
Prof Ben Rubinstein
The University of Melbourne
You call that a �data release?
Partners in it’s-not-a-crime
2
Dr Chris Culnane
Castellate Consulting & UoM
A/Prof Vanessa Teague
Thinking Cybersecurity & ANU
Anything brilliant in this talk is due to them. Clumsy presentation is due to me.�
With special thanks also to Anthony Carbines MP, Peter Tonoli, and David Watts.
Privacy is a wicked problem needing multiple disciplines
What can go wrong when lessons of some are ignored?
Let’s start with some examples from the Australian context.
The 2016 Medicare data release
4
C. Culnane, B. I. P. Rubinstein, and V. Teague.
Health data in an open world.
CoRR, abs/1712.05627, 2017.
August 2016 – MBS/PBS dataset is released
Medicare/Pharmaceutical Benefits Schedules
3 billion lines of data
5
Infographic from the Dept of Health
*What protections were in place?
Encrypted Provider IDs and Patient PINs
Collapsed locations into 4 geographic regions
Year of birth
Removal of centenarians
Service & supply date perturbed by up to 14 days
Extremely low service volume items removed
6
(Encrypted) patient ID | 0345952108 |
Gender | F |
Year of birth | 1963 |
(Encrypted) patient ID | 0345952108 |
State | Vic-Tas |
Date | 7 Aug 1992 |
(Encrypted) supplier ID | 2340981234 |
Item code | 00023 (GP visit) |
Price paid by patient | $85 |
Price reimbursed by Medicare | $60 |
Various other details | … |
“GP” in U.S.: “Family Doc”.�Or: HIV medicine,�2nd trimester labor, …
September 2016: Decryption reported to DHS
Decrypted Provider IDs in full
Aust Privacy Commissioner: Encryption dates back…
Instead: should have used RSA, AES, random IDs
We responsibly disclosed to Dept
7
Privacy Amendment�(Re-identification Offence) Bill 2016
One day before agreed�announcement day…
Attorney-General memo:�Intention to amend Privacy Act 1988
The Bill
8
Life of a ReID Bill
12 Oct 2016 – Intro to the Senate
10 Nov 2016 – Ref to Senate Legal Committee
16 Dec 2016 – Consultation Period Ended
7 Feb 2017 – Senate Committee Report
. . .
6 Jun 2019 – Zombie Bill!
But even without passing, the retroactive Bill’s intended outcome was achieved: �stifle disclosure of (existing) breaches
Overwhelmingly critical response
Law Council of Australia
Australian Bankers Associations
14/15 submissions critical
Final Senate Committee report:
“The committee notes the concerns … However…the bill �provides a necessary and proportionate response”
9
10
�“Health Minister Sussan Ley insists the data, which was �loaded onto the internet, does not identify patients.”
But, we’re abound with health data
11
Searching for Vanessa
17,310 women share her birthyear
59 also had children born 2006, 2011 in Australia
23 also based in Victoria
0 with child DOB with perturbations
12
Anyone could do this!!
Not in dataset
Mothers unique in MBS-PBS-10%
13
“It’s only a sample” or “bah humbug ‘confidence’”
14
�DHS whole-of-population statistics on MBS billing rates
Reidentifications
Wiki/news articles on 18 mums with 2+ births
25 more queries
15
Other risks – only partially assessed
Fingerprinting by billing amounts
Melbourne Pharmaceutical Datathon with postcodes
A release born out of an open-data-first environment
16
The 2018 Myki �data release
17
C. Culnane, B. I. P. Rubinstein, and V. Teague.
Stop the open data bus, we want to get off.
CoRR, abs/1908.05004, 2019.
July 2018 – Myki dataset is released
Myki
1.8 billion lines of data (touch-on/off events)
18
Wikimedia user Fiveapu
What protections were in place?
cardId
�
Apparently no change to times or locations
Apparently no removal of low volumes
19
cardId | 154449 |
Date-time | 2015-08-10 12:34:56 |
Touch type | Touch on |
Location info | Stop ID, route ID, etc. |
Card type | Type 48 – �Transit Police Travel Pass |
!!! 74 card types: 371 Federal Police (type 46), 1232 Transit Police (type 48), �8 Federal Parliamentarians (type 50), �424 State Parliamentarians (type 51), �697k children 5-18, 179k secondary schoolers
1
173211
173338
191920
154449
356913
180637
Searching for Chris and Ben
We had registered our Myki’s, giving access to 6mo’s data down to per-second events
�* Weren’t “clever”: Ben 8-9am, 5-6pm; Chris 7-8am, 7-8pm
To validate further
20
Wikimedia public domain
In the dataset
Co-traveller analysis for Ben and Chris
Definition: Two cards are co-travellers if they touch on at the same stop within 5 seconds of one another.
In the 18mo period from 2017 onwards
38 repeat co-travellers
363 repeat co-travellers
Conclusion: co-travellers rare, repeats rarer; concerning � ease of finding family members/close partners
21
Searching for a Friend
Chris searched Peter Tonoli* in the data
To validate further
Conclusion: particularly concerning for domestic violence cases
* Peter consented for conducting and reporting on this search
22
Wikimedia user Wrev
In the dataset
Searching for a Stranger
Can we search for a public figure?
Anthony Carbines MP (State Member for Ivanhoe)
We linked* the Myki dataset to this prior knowledge
* Mr Carbines consented to inclusion in our report
23
In the dataset
September 2018 – Responsible disclosures
Timeline of responsible disclosures
Breach of Privacy and Data Protection Act 2014 (VIC)
24
Five-Safes cracking
25
Wikimedia user Jon.lorquet
C. Culnane, B. I. P. Rubinstein, and D. Watts.
Not fit for purpose: A critical analysis of the ’five safes’.
CoRR, abs/2011.02142, 2020.
Introducing…. 5 Safes
“The Five Safes framework takes a multi-dimensional approach to managing disclosure risk. Each safe refers to an independent but related aspect of disclosure risk. The framework poses specific questions to help assess and describe each risk aspect (or safe) in a qualitative way. This allows data custodians to place appropriate controls, not just on the data itself, but on the manner in which data is accessed. The framework is designed to facilitate safe data release and prevent over-regulation.”
�– ABS, Five Safes Framework – Data Confidentiality Guide
26
safe people
safe projects
safe settings
safe data
safe outputs
Historical context for 4-5 Safes
Introduced 2002 by Felix Ritchie, UK Office of National Statistics
Initially a data protection framework to guide risk management, has grown into cornerstone of data sharing and access policy and legislation
27
Critical analysis (partial list)
Genesis of 5 Safes a mindset of avoiding over regulation
Emotive and appropriated language
Simplistic guidance across safes
28
“The Data Sharing Principles looks suspiciously like the Five Safes.” �– Justin Warren, EFA Board
“The Five Safes framework takes a multi-dimensional approach to managing disclosure risk. Each safe refers to an independent but related aspect of disclosure risk. The framework poses specific questions to help assess and describe each risk aspect (or safe) in a qualitative way. This allows data custodians to place appropriate controls, not just on the data itself, but on the manner in which data is accessed. The framework is designed to facilitate safe data release and prevent over-regulation.”
�– ABS, Five Safes Framework–Data Confidentiality Guide
Why stop at 4 5 Safes: Safer, safest?
29
Pages 18-19
Closer look at “safe data” (IANAL)
Data Availability and Transparency Code 2022
�
�Data Availability and Transparency Act 2022
The Privacy Act 1988 defines “de-identified” data (note changes coming to Act)
30
Does usage of 5 Safes promote PETs?
31
Google searches conducted Wed 14 Feb 2024
Looking again at “safe data” – 2018 ACS Guide
32
What can we do?
33
We need (some)�hammers
Golden hammer (aka. law of the instrument, law of the hammer, Maslow's hammer/gavel):�A cognitive bias that involves an over-reliance on a familiar tool. Abraham Maslow wrote in 1966, "If the only tool you have is a hammer, it is tempting to treat everything as if it were a nail.” (Wikipedia)
Not all hammers are golden
Hammers from computer science that are sometimes missing in privacy management
34
Consider differential privacy
Threat model
Definition / security property
35
DeID as property of datasets… (compare to DP)
36
Recommendations out of Myki experience
OVIC’s outcomes for public data custodians
Our report’s recommendations
Cf. 2017 DP for Opal data: Data61 proposal, our response, and a counter
37
Closing Proposal: Avoiding death by N safes
Unbounded N safes?
A modest (?) proposal
38
Thankyou!
As a parting note…
The golden hammer of criminalising re-identification is back…
UK passed the Data Protection Act 2018 criminalising the knowing �reidentification of “anonymised” data
Australia’s AG has (Feb 2023) made a proposal as part of the Privacy Act reforms: �"Proposal 4.7: Consult on introducing a criminal offence for malicious re-identification of de-identified information where there is an intention to harm another or obtain an illegitimate benefit, with appropriate exceptions."
Govt has already agreed in their Sept 2023 response…
39