1 of 41

Ethical and privacy-preserving internet-mediated research

Dr. Mainack Mondal Dr. Guillermo Suarez-Tangil

IIT Kharagpur, India IMDEA Networks/KCL

Tutorial: Tracking the Trackers

WebSci’21, virtual venue

2 of 41

Internet-mediated research

“Research conducted through the medium of the Internet”

-- Clifford et al., 2010

Encompasses almost all of research in web science community

3 of 41

Tracking as internet-mediated research

4 of 41

Web tracking

Web tracking is the practice by which operators of websites collect, store and share information about visitors’ activities

  • But its not only operators of websites
    • Tracking can be used for research
    • A swiss knife of understanding an online system

  • Tracking as Internet-mediated research
    • Collecting data to understand the system dynamics / behaviors of the actors
    • Users, third party trackers in websites
    • Often leverage data about/generated by users -- ethics and privacy concerns

5 of 41

Outline

  • Internet-mediated research
  • The Belmont Report
  • Basic ethical principles
  • Theories of data privacy
  • Ethics and privacy of Internet-mediated research

6 of 41

Ethics of human subjects research:

Belmont report

7 of 41

The Belmont Report

  • Timeline
    • 1974 (National Research Act)
    • 1976 (Belmont Conference)
  • Objective
    • Reassurance that abuses are prevented
    • Deal with threats to human values
  • Scope
    • Design for bio-medical and behavioral studies
    • Analytical framework to help in the understanding of the ethical issues

8 of 41

  • Internet-mediated research
  • The Belmont Report
  • Basic ethical principles
  • Theories of data privacy
  • Ethics and privacy of Internet-mediated research

9 of 41

Basic ethical principles

  • The Belmont Report
    • Basic justification of human actions
    • Many particular ethical prescriptions and evaluations
  • Menlo Report
    • Based on the Belmont Report
    • Context of ICT and cybersecurity R&D
  • Principles
    • Respect for persons
    • Beneficence
    • Justice
    • Respect for law and public interest

10 of 41

Basic ethical principles / Respect for persons

  • Participation
    • Voluntary
    • Needs informed consent
  • Individuals:
    • Treated as autonomous agents
    • Need protection when they have diminished autonomy
  • Co-laterals:
    • Respect for other individuals that are not targets yet they are impacted
    • Case study: Collected Facebook data from user X, but it also contains data from X’s friends who did not consent (Cambridge Analytica)

11 of 41

Basic ethical principles / Beneficence

  • Do not harm
  • Trade-off
    • Maximize probable benefits
    • Minimize probable harms
  • Systematic assessment
    • Risk of harm
    • Benefit

12 of 41

Basic ethical principles / Justice

  • Equal consideration in how to be treated
  • The benefits of research should be fairly distributed
  • Selection of subjects should be fair
  • Burdens should be allocated equitably across impacted subjects

13 of 41

Basic ethical principles / Respect for law

  • Engage in legal due diligence
  • When analyzing or operating in public services and companies:
    • Respect the Terms of Use
  • Be transparent in methods and results
  • Conduct research towards a public interest
  • Be accountable for actions

14 of 41

  • Internet-mediated research
  • The Belmont Report
  • Basic ethical principles
  • Theories of data privacy
  • Ethics and privacy of Internet-mediated research

15 of 41

What is privacy in human subjects research

16 of 41

Privacy: definitions

  • Have a very extensive history
    • 1890: Warren and Brandeis (Law)
    • 1967: Alan Westin (Law)
    • 1975: Irwin Altman (Anthropology)
      • 1992: Sandra Petronio (CPM theory )
      • 2003: Palen and Dourish’s interpretation
    • 2008: Daniel Solove (Solove’s taxonomy)
    • 2011: Helen Nissenbaum (Contextual integrity theory )

17 of 41

Privacy: definitions

  • Have a very extensive history
    • 1890: Warren and Brandeis (Law)
    • 1967: Alan Westin (Law)
    • 1975: Irwin Altman (Anthropology)
      • 1992: Sandra Petronio (CPM theory )
      • 2003: Palen and Dourish’s interpretation
    • 2008: Daniel Solove (Solove’s taxonomy)
    • 2011: Helen Nissenbaum (Contextual integrity theory )

18 of 41

Westin: Privacy as control (1967)

  • “Privacy is the claim of individuals, groups or institutions to determine for themselves when, how, and to what extent information about them is communicated to others.”

--- Alan Westin

19 of 41

Four states of Westin’s theory

  • Four states of privacy
    • Solitude: not observed by others
    • Intimacy: communicate with a small group
    • Anonymity: free from identification/surveillance
    • Reserve: limit information disclosure to others and others respecting the desire

20 of 41

Westin’s theory: Exercise

  • Question:
    • X and Y are sitting in a restaurant and X was talking about his personal life.
    • Z, an eavesdropper sitting in the next table, are listening to them, although X did not realize it.
    • Can you explain, using Alan Westin’s privacy definition and privacy states, if X’s privacy is being violated in this scenario?

Solitude? Intimacy? Anonymity? Reserve?

21 of 41

Westin’s theory exercise revisited

  • Violation of Intimacy, Reserve:
    • X and Y are sitting in a restaurant and X was talking about his personal life.
    • Z, an eavesdropper sitting in the next table, are listening to them, although X did not realize it.
    • Can you explain, using Alan Westin’s privacy definition and privacy states, if X’s privacy is being violated in this scenario?

22 of 41

  • What is Ethics : Belmont report / Menlo report
    • Respect users autonomy: informed consent
    • Benefit them: Assess the risk, minimal harm
    • Justice: Be fair to the participants -- compensate users according to their effort
    • Abide by law: Check ToS and laws like GDPR

  • What is privacy : Westin’s theory
    • Maintain desired privacy states
    • Check violation of desired solitude, intimacy, reserve, anonymity

23 of 41

Outline

  • Internet-mediated research
  • The Belmont Report
  • Basic Ethical Principles
  • Theories of data privacy
  • Ethics and privacy of Internet-mediated research

24 of 41

Ensuring ethics of internet-mediated research

25 of 41

Internet-mediated research

  • Maximization of benefits vs minimization of harms
    • Beneficence does not require all harm to be completely eliminated, but
    • ICT research risks can extend beyond “the human subject”
      • Identify other stakeholders, including society as a whole

  • Benefits: are generally more clear and easier to identify!
  • Assessing harms requires more thought

26 of 41

Internet-mediated research

  • Harms
    • Systems assurance (confidentiality, availability, integrity);
    • Individual and organizational privacy;
    • Reputation, emotional well-being, or financial sensitivities; and
    • infringement of legal rights (derived from constitution, contract, regulation, or common law)
  • Challenges
    • Informed consent is not always possible
    • What is it considered minimal risk?

Minimal risk means that the probability and magnitude of harm or discomfort anticipated in the research is not greater in and of itself than those encountered during daily life or during the performance of routine physical and psychological examinations or tests

https://www.ncbi.nlm.nih.gov/books/NBK217976/

27 of 41

Large scale “Passive Measurements”

  • Passive measurements
    • Recordings of observations

  • Key issues
    • Large scale internet measurement - tracking without explicit consent
    • How do you obtain consent from all subjects involved?
    • Risk assessment

28 of 41

IRB: Institutional review board

Many organizations have an ethics review process (sometimes called an Institutional Review Board, IRB). In some cases, research work may clearly have no human subjects, and formal institutional review may not be required. (However, a sentence in the paper stating this evaluation is still required.) In many cases, IRB involvement is appropriate. IRB approval of research is an important factor (and should be mentioned), but the program committee will independently evaluate the ethical soundness of the work just as they evaluate its technical soundness.

-- SIGCOMM 2021 CFP

29 of 41

Censorship Measurements: The case of Encore

  • Background
    • Measuring censorship has been an active area of research
    • The Encore paper SIGCOMM’15 (*)
  • The experiment
    • Certain websites are extended with measurement scripts
    • Browser perform requests for websites
    • It is recorded when such request has been performed successfully or not

(*) S. Burnett and N. Feamster, “Encore: Lightweight measurement of web censorship with cross-origin requests,” ACM SIGCOMM Computer Communication Review, vol. 45, no. 4, pp. 653–667, 2015.

Historically, there was a strong disagreement as of whether this project needs IRB.

What do you think? https://pollev.com/gtangil (we get back to this in the backup slides / Q&A)

30 of 41

How to do ethical internet-mediated research?

  • Proper risk assessment
  • Convey those risks to the volunteers
    • Be clear, concise and informative details
    • Let them opt-in the research
  • Is it enough to rely on well-established boards?
    • IRB approvals is not always enough
    • TPC is not always in the look for Ethical violations
    • More food for thought:
      • The “Hypocrite Commits” (see the IEEE S&P’21 statement *)
  • Have very high standards when it comes to Ethics

(*) https://davisjam.medium.com/ethical-conduct-in-cybersecurity-research-86d13b6b6eed

31 of 41

Ensuring privacy of internet-mediated research

32 of 41

Westin’s theory exercise revisited

  • Violation of Intimacy, Reserve:
    • X and Y are sitting in a restaurant and X was talking about his personal life.
    • Z, an eavesdropper sitting in the next table, are listening to them, although X did not realize it.
    • Can you explain, using Alan Westin’s privacy definition and privacy states, if X’s privacy is being violated in this scenario?

  • Now replace X = user, Y = a website, Z = tracking tech

Privacy violation, because the user was not informed clearly if data is collected and how the data will be used

33 of 41

How to get Informed consent?

Consent form with privacy policies

34 of 41

Bad privacy policies: Facebook example

For reference: #words in Magna Carta = 4594

35 of 41

Good practice: Short + structured privacy policy

  • Write the privacy policy in a structured format
    • What personal information collected by you?
    • Why? How?
    • Harm to user? Security of data?

  • Often IRBs of institutions put template informed consent forms online

36 of 41

Summary: what to do for privacy-preserving ethical internet-mediated research?

37 of 41

  • IRB approved study design (baseline, not enough, follow ethical principles)
  • Concise privacy policy in the informed consent form to users before the study
  • Reduce amount of collected data as much as you can
  • Use data for only the IRB approved purpose
  • Share Data only with user’s prior permission (preferred) and/or under very specific usage agreement
  • If a design seems unethical to you/your team members, it probably is!

38 of 41

References

[1] Dittrich, D. and Kenneally, E. The Menlo report: Ethical Principles Guiding Information and Communication Technology Research, U.S. Department of Homeland Security, Aug. 2012

[2] Craig Partridge, Mark Allman. Addressing Ethical Considerations in Network Measurement Papers. ACM SIGCOMM Workshop on Ethics in Networked Systems Research, August 2015.

[3]Mark Allman. Traffic Monitoring Considered Reasonable, IEEE Symposium on Security and Privacy Cyber-security Research Ethics Dialog and Strategy Workshop (CREDS), May 2013.

[4] Belmont Report. Ethical Principles and Guidelines for the Protection of Human Subjects of Research. https://www.hhs.gov/ohrp/regulations-and-policy/belmont-report/index.html

[5] Van Der Ham, Jeroen. "Ethics and Internet measurements." 2017 IEEE Security and Privacy Workshops (SPW). IEEE, 2017.

[6] Jones, Ben, et al. "Ethical concerns for censorship measurement." Proceedings of the 2015 ACM SIGCOMM Workshop on Ethics in Networked Systems Research. 2015.

[7] A. Narayanan and B. Zevenbergen, “No encore for encore? Ethical questions for web-based censorship measurement,” SSRN Electronic Journal. [Online]. Available: http://dx.doi.org/10.2139/ssrn.2665148

39 of 41

Ethical and privacy-preserving internet-mediated research

Dr. Mainack Mondal Dr. Guillermo Suarez-Tangil

IIT Kharagpur, India IMDEA Networks/KCL

Tutorial: Tracking the Trackers

WebSci’21, virtual venue

Questions, comments or thoughts?

40 of 41

Censorship Measurements: The case of Encore (cont)

  • Visit the following link and answer these questions:
    • https://pollev.com/gtangil
  • Key questions:
    • Would the project need IRB? Are there human subjects involved?
    • Is there any risk involved? Is there beneficence?

41 of 41

Censorship Measurements: The case of Encore (cont)

  • Would the project need IRB?
    • Well, it all goes down to the following question
  • Are there human subjects involved?
    • IRB 1st author concluded that there were no human subjects
    • IRB 2nd author declined a formal review for the same reason
    • SIGCOMM’15 PC felt there was

  • Is there any risk involved?
    • Individuals do face risks. Is it minimal? It depend on the type of censored website.

IMPORTANT DISAGREEMENT