1 of 55

A Crash Course on Ethics in Natural Language Processing

Version 1.0

Annemarie Friedrich and Torsten Zesch

License: CC-BY

2 of 55

Ethics for NLP

What comes to your mind when you think of ethics?

What comes to your mind when you think about ethics for NLP?

Have you encountered any ethical problems in your life?

Why do you think this topic is important?

What do you expect to learn in this crash course?

3 of 55

Why does Ethics matter for NLP?

NLP has the aim of modeling language, an inherently human function

NLP works with textual data or human subjects → not free of bias, prejudice, …

Language technology is widely applied (e.g. on social media) → can potentially harm anyone

Language technology shapes the way we experience the world

Bias

Privacy

Fairness

Dual Use

Environmental Issues

...

4 of 55

Sources and Types of Harm - Overview

NLP System

Data

Bias

Unfair Outcomes

Direct Harm

Bias

Direct harm

5 of 55

Learning Goals

After this course, you will be able to:

  • Understand terminology and concepts related to ethics in NLP
  • Analyze a given task, method or system for ethical issues
  • Understand how NLP applications can cause harm
  • Analyze ethical issues under different ethical perspectives

6 of 55

What is Ethics?

Branch of Philosophy

Ethics is the philosophical study of morality. It is the study of what are good and bad ends to pursue in life and what is right and wrong to do in the conduct of life. It is [...] primarily a practical discipline.

(Deigh, 2010, p. 7)

Synonym for Moral Code

Sometimes “ethics” is used to refer to the moral code or system of a particular tradition.

Examples: Christian ethics, professional ethics

How do these meanings relate to “Ethics for NLP”?

7 of 55

What is Morality?

Universal Concept

Universal ideal of what one ought to do or ought not to do, guided by reason / rational grounds.

Conventional System of Community

The members’ shared beliefs about wrong and right, good and evil, and the corresponding customs and practices that prevail in the society.

How do these concepts relate to “Ethics for NLP”?

8 of 55

Whose Life Matters More?

http://moralmachine.mit.edu/hl/de

Try it out!

9 of 55

Two ethical theories

Deontology

Deon (Greek) = duty

“Identify your duty and act accordingly”

Generalization principle: prioritizes intent as the source of ethical action, should be reasonable.

Teleology

Telos (Greek) = goal

Outcome-oriented

Utilitarianism

“Choose that action that optimizes the outcome”

“An action is ethical only if it is not irrational for the agent to believe that no other action results in greater expected utility” (Bentham 1789)

10 of 55

Moral vs. Legal

legal

illegal

moral

Doing your homework

Civil disobedience

immoral

Cheating on your spouse

Murder

11 of 55

Reading Assignment (Homework)

Hovy & Spruit: The Social Impact of Natural Language Processing. (ACL 2016)

TODO: add questions / instructions regarding the paper

Political correctness classifier?

12 of 55

Source of Harm - Direct

NLP System

analyzing medical documents

drug overdose killing the patient

13 of 55

Dual Use

NLP Task

Beneficial Use

Malicious Use

Hate speech detection

Fighting hate crimes

Censorship of free speech

Detection of fake news / reviews

Fighting misinformation

Generation of fake news / reviews

...

...

...

Can you think of other NLP tasks that have beneficial but also potentially malicious uses?

Assume you are publishing a piece of software on GitHub. Should you mention potential malicious uses in the corresponding Readme?

14 of 55

Source of Harm - Bias

Data

NLP System

15 of 55

Doctor vs. Nurse

The doctor recommended to perform an X-ray.

He/She said …

The nurse recommended to perform an X-ray.

He/She said …

Do you think “he” or “she” is a more likely continuation in the above cases (respectively)?

What would happen if you asked a large pre-trained language model?

16 of 55

Bias in Machine Translation

Image source: https://arxiv.org/pdf/1809.02208.pdf

Useful or harmful?

17 of 55

Bias in Machine Translation

Detecting gender-neutral queries

Generate gender-specific translations

Check for accuracy

18 of 55

What is Bias?

Cognitive bias arises due to the tendency of the human mind to categorize the world.

→ simplifies processing.

Social biases in data, algorithms, and applications

Statistical bias in machine learning

  • Inductive bias: assumptions made by model about target function to generalize from data

19 of 55

What is Bias? (Technical View)

Bias in machine learning

Bayesian probabilities: prior

May be intended (e.g., domain adaptation) or unintended

Is bias always a bad thing?

20 of 55

Why is Bias Problematic? (Social View)

NLP Applications

Employment matching, advertisement placement, parole decisions, search, chatbots, face recognition, ...

Social Stereotypes

Gender, Race, Disability, Age, Sexual orientation, Culture, Class, Poverty, Language, Religion, National origin, ...

Sap et al.: The Risk of Racial Bias in Hate Speech Detection. ACL 2019.

21 of 55

Why is Bias Problematic?

Outcome Disparity

Error Disparity

Word Error Rate in automatic captioning is higher for female speakers compared to male speakers (Tatman, 2017)..

Because a “COOKING” event is taking place, the model is more likely to predict the agent to be a woman.

(Zhao et al., 2017)

Image sources: https://www.aclweb.org/anthology/W17-1606.pdf,

https://www.aclweb.org/anthology/D17-1323.pdf

women

men

See also Shah et al. (2020)

22 of 55

Why is Bias Problematic?

(Technical View)

Outcome / Error disparity

Models might amplify bias

51:49 distribution in a feature may lead to 100:0 decision

Is it wrong to build models replicating “real world data”?

In what circumstances?

23 of 55

Sources of Bias in NLP (Shah et al., 2020)

Image Source: https://www.aclweb.org/anthology/2020.acl-main.468.pdf

24 of 55

De-Biasing of Word Embeddings

she

neutral

25 of 55

Bias Exercise

Source: http://wordbias.umiacs.umd.edu

26 of 55

Source of Harm - Unfair Outcomes

NLP System

filtering

job applications

Better chances for people living in a certain area

27 of 55

Fairness

Treating everyone equally is fair, right?

So, everyone gets the same grade from now on ;)

fundamental principle of justice “equals should be treated equally and unequals unequally”

28 of 55

Group vs. Individual Fairness

group fairness

  • errors should be distributed similarly across protected groups

individual fairness

  • similar individuals should be treated similarly regardless of group membership

cannot reach group and individual fairness at the same time

Which groups are/should be protected?

How can we measure similarity of individuals?

29 of 55

30 of 55

https://medium.com/ibm-watson/ethics-in-ai-responsibilities-for-data-analysts-part-2-d76f2343e4d1

31 of 55

Source of Harm - Input/Training Data

Data

NLP System

32 of 55

Privacy

“I’ve got nothing to hide.”

Do you have curtains? / Do you close your shutters at night?

Can I see your credit card bills from last year?

33 of 55

A Taxonomy of Privacy (Solove, 2007)

Privacy = intimacy?

Privacy = the right to be let alone?

Problems and harms related to privacy

“Privacy [...] is a plurality of different things that do not share one element in common but that nevertheless bear a resemblance to each other.”

34 of 55

Data Privacy Regulations

European Regulation 2016/679

General Data Protection Regulation (GDPR)

Main rights of the “data subject” (natural person):

  • Right of access
  • Right of rectification
  • Right to erasure (“right to be forgotten”)
  • Right to withdraw consent at any time
  • Right to lodge a complaint with a supervisory authority
  • Right to restriction of processing
  • Right to data portability

Similar laws in the US: California Consumer Privacy Act

Applies for the data of all EU citizens - also if controller operates from a country outside the EU!

35 of 55

Data Privacy vs. Data Ethics

  • Data privacy is responsibly collecting, using and storing data about people, in line with the expectations of those people, customers, regulations and laws.
  • Data ethics is doing the right thing with data, considering the human impact from all sides, and making decisions based on your values.

[based on: Lawler, 2019]

“Just because we can do something, doesn’t mean we should.”

Should a company sell user information to political campaigns?

36 of 55

Anonymization (De-Identification)

After having run some anonymization system on our data, is everything fine?

Image Source: https://www.aclweb.org/anthology/2020.lrec-1.870/

HitzalMed

(Lopez et al., 2020)

37 of 55

Authorship Attribution / Author Profiling

What are potential chances and risks of this type of technology?

38 of 55

Reading Assignment / Discussion

Daniel J. Solove. 'I've Got Nothing to Hide' and Other Misunderstandings of Privacy. San Diego Law Review, Vol. 44, p. 745, 2007

Germany’s complicated relationship with Google Street View. NY Times, April 2013.

Questions to think about / discuss:

Which dimensions of privacy matter most to you?

A software developer accidentally notices a document where a user is drafting a suicide note. Should he/she contact the police to save a life, or respect their user’s secret?

Can you imagine a situation where interfering with someone’s privacy leads to an economic / financial issue for that person?

39 of 55

Word Error Rates for Automatic Captioning

on YouTube (Tatman, 2017)

WER higher for Scottish speakers

WER higher for female speakers compared to male speakers

Image Source: https://www.aclweb.org/anthology/W17-1606

40 of 55

Demographic Factors Improve Classification Performance (Hovy, 2015)

Distribution of categories by gender

Is it okay to leverage the author’s gender information as explicit features for text classification?

What would be recommended from a utilitarian / generalization perspective?

Image Source: https://www.aclweb.org/anthology/P15-1073.pdf

x/Axis: Topics

41 of 55

Reading Assignment

Prabhumoye et al.: Case Study: Deontological Ethics in NLP. NAACL 2021.

Consider a project that you are working on / have worked on or pick a recent research paper from the ACL Anthology. Analyse the method / system both from a utilitarian and from a generalization perspective. How would scholars of each ethical theory evaluate the ethicality of the method/system?

42 of 55

“Applications”: NLP for Social Good

Civility in communication: techniques to monitor trolling, hate speech, abusive language, detect fake news, etc.

Image Source: https://www.stiftung-nv.de/de/publikation/kurzanalyse-zu-trumps-crime-tweet-deutschland-viel-aufmerksamkeit-wenig-unterstuetzung

43 of 55

Other topics

Explainability

Lorem ipsum tempus

Lorem ipsum congue tempus

Lorem ipsum tempus

Lorem ipsum congue tempus

Lorem ipsum tempus

Explainability

Crowd-

sourcing

Pollution

Safety

Pick a topic of your choice and research its relationship to ethics. What are common arguments made? Do you agree? Can you find interesting examples for ethical or unethical behavior?

44 of 55

Reading Suggestions:

Environmental Issues

Bender et al. On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?. FAccT ’21, March 3–10, 2021, Virtual Event, Canada

Emma Strubell, Ananya Ganesh, Andrew McCallum, Energy and Policy Considerations for Deep Learning in NLP. ACL 2019.

45 of 55

Misc / Practical Hints

IRB = Institutional Review Board (Ethics Review Board)

Reviews all human experimentation

Find out how to contact the IRB of your institution.

The ACL has adopted ACM Code of Ethics and Professional Conduct and published an FAQ with hints on conducting research and publishing in an ethical manner (see, e.g., ACL-IJCNLP 2021 Ethics FAQ).

Association for Computational Linguistics

46 of 55

Practical summary

Analyse data, task and outcomes for potential harm.

Can benefits outweigh harms?

47 of 55

Retrospection

What have you learned?

What does that mean for you personally?

What was surprising?

48 of 55

References

49 of 55

Literature –  Ethics in NLP 

Overviews

  • Dignum, V. (2019). Responsible artificial intelligence: how to develop and use Ai in a responsible way. Cham, Switzerland: Springer.

  • Fort, K., & Couillault, A. (2016). Yes, We Care! Results of the Ethics and Natural Language Processing Surveys. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16). Retrieved from https://www.aclweb.org/anthology/L16-1252 

  • Hovy, D., & Spruit, S. L. (2016). The Social Impact of Natural Language Processing. In K. Erk & N. A. Smith (Chairs), Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Berlin, Germany. Retrieved from https://www.aclweb.org/anthology/P16-2096.pdf 

50 of 55

Literature –  Ethics in NLP 

Overviews

  • Leidner, J. L., & Plachouras, V. (2017). Ethical by Design: Ethics Best Practices for Natural Language Processing. In D. Hovy, S. Spruit, M. Mitchell, E. M. Bender, M. Strube, & H. Wallach (Chairs), Proceedings of the First ACL Workshop on Ethics in Natural Language Processing. Retrieved from https://www.aclweb.org/anthology/W17-1604.pdf

  • Mittelstadt, B. D., Allo, P., Taddeo, M., Wachter, S., & Floridi, L. (2016). The ethics of algorithms: Mapping the debate. Big Data & Society3(2), 205395171667967. Retrieved from �https://journals.sagepub.com/doi/pdf/10.1177/2053951716679679

  • Zweig, K. A. (2019). Ein Algorithmus hat kein Taktgefühl: wo Künstliche Intelligenz sich irrt, warum uns das betrifft und was wir dagegen tun können. München: Heyne.

51 of 55

Literature –  Ethics in NLP 

Bias

  • Tatman, R. (2017). Gender and Dialect Bias in YouTube's Automatic Captions. In D. Hovy, S. Spruit, M. Mitchell, E. M. Bender, M. Strube, & H. Wallach (Chairs), Proceedings of the First ACL Workshop on Ethics in Natural Language Processing. Retrieved from https://www.aclweb.org/anthology/W17-1606.pdf

  • Prates, M. O. R., Avelar, P. H., & Lamb, L. C. (2019). Assessing gender bias in machine translation: A case study with Google Translate. Neural Computing and Applications, 14(1). Retrieved from https://arxiv.org/pdf/1809.02208.pdf 

  • Stanovsky, G., Smith, N. A., & Zettlemoyer, L. (2019). Evaluating Gender Bias in Machine Translation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Retrieved from https://www.aclweb.org/anthology/P19-1164.pdf

52 of 55

Literature –  Ethics in NLP 

Bias

  • Davidson, T., Bhattacharya, D., & Weber, I. (2019). Racial Bias in Hate Speech and Abusive Language Detection Datasets. In Proceedings of the Third Workshop on Abusive Language Online (pp. 25–35). Association for Computational Linguistics. Retrieved from https://www.aclweb.org/anthology/W19-3504.pdf 

  • Sap, M., Card, D., Gabriel, S., Choi, Y., & Smith, N. A. (2019). The Risk of Racial Bias in Hate Speech Detection. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (pp. 1668–1678). Association for Computational Linguistics. Retrieved from https://www.aclweb.org/anthology/P19-1163.pdf

53 of 55

Literature –  Ethics in NLP 

Bias

  • Caliskan, A., Bryson, J. J., & Narayanan, A. (2017). Semantics derived automatically from language corpora contain human-like biases. Science (New York, N.Y.), 356(6334), 183–186. Retrieved from https://arxiv.org/pdf/1608.07187.pdf

  • Sheng, E., Chang, K.-W., Natarajan, P., & Peng, N. (2019). The Woman Worked as a Babysitter: On Biases in Language Generation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) (pp. 3405–3410). Association for Computational Linguistics. Retrieved from https://www.aclweb.org/anthology/D19-1339.pdf

  • Zhao, J., Wang, T., Yatskar, M., Ordonez, V., & Chang, K.-W. (2017). Men Also Like Shopping: Reducing Gender Bias Amplification using Corpus-level Constraints. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (pp. 2979–2989). Association for Computational Linguistics. Retrieved from https://www.aclweb.org/anthology/D17-1323.pdf

54 of 55

Literature –  Ethics in NLP 

Fairness

  • Loukina, A., Madnani, N., & Zechner, K. (2019). The many dimensions of algorithmic fairness in educational applications. In Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications (pp. 1–10). Association for Computational Linguistics. Retrieved from https://www.aclweb.org/anthology/W19-4401.pdf

55 of 55

Literature –  Ethics in NLP 

Gender Stereotypes

  • Bhaskaran, J., & Bhallamudi, I. (2019). Good Secretaries, Bad Truck Drivers? Occupational Gender Stereotypes in Sentiment Analysis. In Proceedings of the First Workshop on Gender Bias in Natural Language Processing (pp. 62–68). Association for Computational Linguistics. Retrieved from https://www.aclweb.org/anthology/W19-3809.pdf