Automating Cookie Consent
and GDPR Violation Detection
Dino Bollinger, Karel Kubicek, Carlos Cotrini, David Basin
31st USENIX Security Symposium (August 11, 2022)
Cookie consent
2
ePrivacy Directive and General Data Protection Regulation (GDPR)
ePrivacy Directive:
GDPR Consent:
3
Non-compliance is widespread
(Utz 2019, Trevisan 2019, Matte 2020, Nouwens 2020, Kampanos 2021, Santos 2021, etc.)
(Libert 2018, Trevisan 2019, Matte 2020, Nouwens 2020, etc.)
(Bösch 2016, Grassl 2020, Hasner 2021, Sanchez-Rola 2019, Htut Soe 2020, etc.)
4
5
Goal: Enforce cookie consent on client-side while browsing.
Our solution: CookieBlock
6
Implementation:
Data collection: selecting data sources
7
Data collection: web crawlers
8
HTTP GET
Image source: Mozilla Firefox (https://commons.wikimedia.org/wiki/File:Firefox_logo,_2017.svg)�The Firefox logo is a trademark of the Mozilla Foundation in the U.S. and other countries.
Data collection: results of OpenWPM crawl
9
Feature extraction from textual cookies
10
Ex 1: Shannon entropy
Ex 2: Content encodings
52 types in total, including:
XGBoost classifier and baseline
11
entropy > 0.8
Session?
- 3
+ 3
- 1
False
True
True
False
Prediction: Advertising
XGBoost
Cookiepedia
Classifier evaluation
12
Cookiepedia bal. accuracy
84.7% ± 0.3%
XGBoost bal. accuracy
84.2% ± 0.27%
CookieBlock browser extension
Empirical evaluation:
13
Potential violations: per type
14
Potential violations: wrong purpose and undeclared cookies
15
Undeclared cookies
GDPR informed consent requirement
Image sources: https://commons.wikimedia.org/wiki/File:Twemoji2_1f36a.svg; https://commons.wikimedia.org/wiki/File:Question_mark_alternate.svg �The Twemoji cookie image is licensed under the Creative Commons Attribution 4.0 International license, and has been altered from its original form. We claim no ownership of the image.
“Google Analytics“ cookie with wrong purpose
Decision from Planet49 case
Potential violations: implicit and ignored consent
16
Cookies set prior to user’s consent
Article 5(3) of the ePrivacy Directive
Cookies set despite negative consent
Article 5(3) of the ePrivacy Directive
Image source: https://icons8.com/icons/set/fast and https://icons8.com/icons/set/consent
Potential violations: histogram
17
Conclusion
18
More info, source, extension links:�https://karelkubicek.github.io/post/cookieblock
Dino Bollinger, Karel Kubicek, Dr. Carlos Cotrini, Prof. David Basin
Backup Slides
19
Consent management platforms: market share & analysis
20
CMP | Market share | Remote | Labels |
tarteaucitron.js | 0.16% | ✗ | ✗ |
Usercentrics | 0.16% | ✓ | ✗ |
CookiePro | 0.15% | ✓ | ✓ |
Borlabs Cookie | 0.12% | ✗ | ✓ |
EU Cookie Law | 0.12% | ✗ | ✓ |
PrimeBox CookieBar | 0.09% | ✗ | ✗ |
Cookie Script | 0.07% | ✓ | ✓ |
Cookie Information | 0.06% | ✓ | ✓ |
Termly | 0.05% | ✓ | ✓ |
Cookie Info Script | 0.05% | ✓ | ✗ |
Easy GDPR | 0.04% | ✓ | ✗ |
CMP | Market share | Remote | Labels |
Osano | 2.25% | ✓ | ✗ |
Cookie Notice | 1.29% | ✗ | ✗ |
OneTrust | 1.17% | ✓ | ✓ |
OptAnon | 1.08% | ✓ | ✓ |
Cookie Law Info | 0.95% | ✗ | ✗ |
Cookiebot | 0.77% | ✓ | ✓ |
Quantcast CMP | 0.68% | ✓ | ✗ |
UK Cookie Consent | 0.33% | ✗ | ✗ |
TrustArc | 0.26% | ✓ | ✗ |
WP GDPR Comp. | 0.20% | ✗ | ✗ |
Moove GDPR Comp. | 0.18% | ✗ | ✗ |
Feature importance
21
How many times is the feature used�High weight → feature used close to the leaf
How many cookies were influenced by the feature�High importance → feature close to the root
Model precision and recall
22
|
88.5% |
94.5% |
|
81.7% |
87.3% |
vs.
Cookiepedia
XGBoost
Strictly necessary
Precision |
Recall |
|
78.7% |
38.1% |
|
76.3% |
52.9% |
vs.
Cookiepedia
XGBoost
Functionality
Precision |
Recall |
|
93.0% |
84.2% |
|
89.7% |
89.8% |
vs.
Cookiepedia
XGBoost
Performance/analytics
Precision |
Recall |
|
79.0% |
94.9% |
|
89.8% |
93.6% |
vs.
Cookiepedia
XGBoost
Tracking/advertising
Precision |
Recall |
Cookiepedia accuracy 86.1% ± 0.1%
XGBoost accuracy 87.2% ± 0.23%
CookieBlock: manual evaluation
23
Related work – extensions
24
Related work – comparable approaches
25
Accuracy: 0.867
Violation detection: outliers, conflicting purposes
26
Conflicting purposes
Non-ambiguous requirement of GDPR
Image sources: https://commons.wikimedia.org/wiki/File:Twemoji2_1f36a.svg�The Twemoji cookie image is licensed under the Creative Commons Attribution 4.0 International license, and has been altered from its original form. We claim no ownership of the image.
Outlier purpose from majority opinion
Lower bound, indicates misbehavior
Violation detection: unclassified cookies, incorrect expiry
27
Incorrect expiry
Violation in Planet49 case
Image source: https://commons.wikimedia.org/wiki/File:Twemoji_1f565.svg�The Twemoji clock image is licensed under the Creative Commons Attribution 4.0 International license. We claim no ownership of the image.
Unclassified cookies
Informed consent requirement of GDPR
Violation statistics, repeated results after 1 year
28
May 2021 crawl (29’206 websites)
Cookiebot: 45.8%, OneTrust: 52.1%, Termly: 2.2%
July 2022 crawl (52’162 websites)
Cookiebot: 57.9%, OneTrust: 39.6%, Termly: 2.6%
Violation statistics, grouped by CMP
29
May 2021 crawl
July 2022 crawl
Presentation Authors:
Dino Bollinger, dino.bollinger@gmail.com� Karel Kubicek, karel.kubicek@inf.ethz.ch
Team:
Dino Bollinger, Karel Kubicek, Dr. Carlos Cotrini, Prof. David Basin
ETH Zurich
D-INFK
Institute of Information Security
https://informationsecurity.ethz.ch/