1
Stop saving IP addresses (!?)
Matt Mathis
mattmathis@measurementlab
Original presentation: Mar 16 2022
�
@
Our Goal Today
2
Outline
3
Our Privacy Statement
4
We need to identify repeated tests from single clients
10 clients contribute 2%
100 contribute 6%
1000 contribute 8%
3474 (0.012%) contribute 10%
10000 contribute 12%
1% contribute 30%
10% contribute 58%
5
We need to strike a balance
Please contribute your thoughts discuss@measurementlab.net
6
Some (weakly) related secondary problems
7
The Current Platform
8
MLab overview (simplified)
9
GCS Archive
BigQuery
Fleet of Measurement Nodes
NDT + sidecar services
UUIDs
Pusher
Annotator
TCP info
.pcap
traceroute
NDT (UUIDs)
Annotations
TCP info
.pcap
Traceroute
ETL Pipeline
Parse
Join
Locate Service
Incoming requests
Views
Gardner
Publish
Publish
Elements of the Fleet
10
Elements of the Pipeline
11
Approaches
12
General strategy
13
Possible annotations
14
Hashes to obfuscate IP addresses
15
Possible annotations provided by the client
16
Redacting IP addresses in the Fleet
17
Redacting IP addresses in the parser
18
Acceptable Use Agreements (AUA)
19
Server side NAT (aka S-NAT, L4 routing, etc)
20
Risks
21
Potential Leaks
22
What about traceroute?
23
IPv4 specific risks
24
IPv6 specific risks
25
Impaired Experiments
26
Understanding repeated tests from single IPs
27
Understanding shared IP addresses
28
Understanding IP address stability
29
Validating MLab itself
30
Curated clients
31
Questions and discussions
32
Discuss
33
But what is PII?
GDPR Article 4, Definitions, paragraph 1:
‘personal data’ means any information relating to an identified or identifiable natural person (‘data subject’); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person;
Do we have a person? Can our data indirectly identify a person? I think not....
Can somebody help us find as strong answer to this question?
34
My wish
Our official statement could become:
MLab collects IP addresses but does not collect any information that might be used to directly or indirectly identify you, the user.
This statement is already true, we just don't use the words.
35