1 of 55

A Crash Course on Ethics in Natural Language Processing

Version 1.0

Annemarie Friedrich and Torsten Zesch

License: CC-BY

Teaching Instructions for interactive lecture:

Answers to questions in blue or questions/assignments on slides with a blue background will not be found within this slide set. They are suggestions for in-class discussions, e.g., in pairs or small groups of 3 students. After a short individual discussion, findings can be discussed in the plenum. Alternatively, the exercises can be used as the basis for homework, group projects or essays.

Important Note:

Our crash course provides a starting point for designing a lecture/unit on ethical NLP. We have tried to provide carefully researched material, but of course, it is quite possible that some things are wong, or will be considered wrong in the future. We release our materials under the CC-BY-SA license, which allows you to re-use, mix, adapt and re-dedistribute the materials in any format. We highly recommend that you make use of this option: Read further, research further, and adapt the crash course according to what you thinks fits best to your crowd of students! We would welcome any feedback and are eager to see what materials you create based on our! Digging deeper on this subject area as a preparation for your lecture(s) will be of great benefit to yourself and to your students. (Hint: We have tried to carefully research the licenses of images used in this lecture and added acknowledgements/references. To the best of our knowledge, they are all compatible with the license. Please be equally careful when sharing derivative materials.)

Acknowledgment:

Special thanks to Dirk Hovy for sharing his slides and for his valuable feedback on this slide set.

We also thank Ronja Laarman-Quante, Sophie Henning, Heike Adel, Andrea Horbach, and the anonymous reviewers from the Teaching for NLP Workshops 2021 for their valuable feedback.

2 of 55

Ethics for NLP

What comes to your mind when you think of ethics?

What comes to your mind when you think about ethics for NLP?

Have you encountered any ethical problems in your life?

Why do you think this topic is important?

What do you expect to learn in this crash course?

3 of 55

Why does Ethics matter for NLP?

NLP has the aim of modeling language, an inherently human function

NLP works with textual data or human subjects → not free of bias, prejudice, …

Language technology is widely applied (e.g. on social media) → can potentially harm anyone

Language technology shapes the way we experience the world

Bias

Privacy

Fairness

Dual Use

Environmental Issues

...

Note that we will discuss several topics in this crash course that some would not necessarily classify as ethical problems in the philosophical sense. However, they are often mentioned in the NLP community in this context as matters are highly intertwined or also related to the “Do’s” and “Don’ts” of NLP.

Today, NLP technology is widely applied. Ethics matter in circumstances when someone could be harmed. How could NLP technology harm people?

NLP technology has been shown not to always be fair, e.g., it works better for particular sub-groups - other groups are excluded from using state-of-the-art NLP technology (more on this later).

Anonymization of texts does not always achieve the desired results: Texts carry latent information about author characteristics, even if we remove explicit features (e.g., race, age, gender) from system input.

NLP technology also influences to what extent we are exposed to particular topics. This may result in:

Over-exposure: people deem things seen more often to be “more important” (examples: neural network-based research in NLP is seen more often so some people deem other approaches to be “less interesting,” but also: associating aggressive behavior with particular sub-groups).
Under-exposure: focus on English language leads to focus on some issues that may not matter so much for other languages.

Another issue is that of overgeneralization. What is the cost of false positives? Consider the following failure cases: Displaying advertisements? Technology-assisted bail decisions?

References:

Hovy & Spruit: The Social Impact of Natural Language Processing. (ACL 2016)

4 of 55

Sources and Types of Harm - Overview

NLP System

Data

Bias

Unfair Outcomes

Direct Harm

Bias

Direct harm

5 of 55

Learning Goals

After this course, you will be able to:

Understand terminology and concepts related to ethics in NLP
Analyze a given task, method or system for ethical issues
Understand how NLP applications can cause harm
Analyze ethical issues under different ethical perspectives

6 of 55

What is Ethics?

Branch of Philosophy

Ethics is the philosophical study of morality. It is the study of what are good and bad ends to pursue in life and what is right and wrong to do in the conduct of life. It is [...] primarily a practical discipline.

(Deigh, 2010, p. 7)

Synonym for Moral Code

Sometimes “ethics” is used to refer to the moral code or system of a particular tradition.

Examples: Christian ethics, professional ethics

How do these meanings relate to “Ethics for NLP”?

What is Ethics?

There are two different usages of the term “Ethics.” The chief meaning of the word is “the philosophical study of morality.” However, there are also other usages of the term in general conversation. It is also used as a synonym for “morality,” and sometimes even more narrowly to refer to the moral code or system of a particular tradition, e.g., “professional ethics” or “Christian ethic.”

The primary aim of ethics a branch of philosophy is to determine how one should live and which actions one should take. It differs from fields such as sociology or empirical psychology in that it does not aim to describe how humans live or which actions they do take. As an analogy, ethics and the other above mentioned fields are in a similar relationship as the study of agriculture versus botany. Agriculture is a practical discipline with the aim of optimizing outcome, while the latter is a science that seeks descriptions and explanations.

References:

John Deigh: An Introduction to Ethics. Cambridge 2010. Page 7.

7 of 55

What is Morality?

Universal Concept

Universal ideal of what one ought to do or ought not to do, guided by reason / rational grounds.

Conventional System of Community

The members’ shared beliefs about wrong and right, good and evil, and the corresponding customs and practices that prevail in the society.

How do these concepts relate to “Ethics for NLP”?

8 of 55

Whose Life Matters More?

http://moralmachine.mit.edu/hl/de

Try it out!

9 of 55

Two ethical theories

Deontology

Deon (Greek) = duty

“Identify your duty and act accordingly”

Generalization principle: prioritizes intent as the source of ethical action, should be reasonable.

Teleology

Telos (Greek) = goal

Outcome-oriented

Utilitarianism

“Choose that action that optimizes the outcome”

“An action is ethical only if it is not irrational for the agent to believe that no other action results in greater expected utility” (Bentham 1789)

10 of 55

Moral vs. Legal

	legal	illegal
moral	Doing your homework	Civil disobedience
immoral	Cheating on your spouse	Murder

11 of 55

Reading Assignment (Homework)

Hovy & Spruit: The Social Impact of Natural Language Processing. (ACL 2016)

TODO: add questions / instructions regarding the paper

Political correctness classifier?

12 of 55

Source of Harm - Direct

NLP System

analyzing medical documents

drug overdose killing the patient

13 of 55

Dual Use

NLP Task	Beneficial Use	Malicious Use
Hate speech detection	Fighting hate crimes	Censorship of free speech
Detection of fake news / reviews	Fighting misinformation	Generation of fake news / reviews
...	...	...

Can you think of other NLP tasks that have beneficial but also potentially malicious uses?

Image by Clker-Free-Vector-Images from Pixabay

Assume you are publishing a piece of software on GitHub. Should you mention potential malicious uses in the corresponding Readme?

14 of 55

Source of Harm - Bias

Data

NLP System

15 of 55

Doctor vs. Nurse

The doctor recommended to perform an X-ray.

He/She said …

The nurse recommended to perform an X-ray.

He/She said …

Do you think “he” or “she” is a more likely continuation in the above cases (respectively)?

What would happen if you asked a large pre-trained language model?

16 of 55

Bias in Machine Translation

Image source: https://arxiv.org/pdf/1809.02208.pdf

Useful or harmful?

17 of 55

Bias in Machine Translation

Image Source:

https://ai.googleblog.com/2018/12/providing-gender-specific-translations.html

Detecting gender-neutral queries

Generate gender-specific translations

Check for accuracy

18 of 55

What is Bias?

Cognitive bias arises due to the tendency of the human mind to categorize the world.

→ simplifies processing.

Social biases in data, algorithms, and applications

Statistical bias in machine learning

Inductive bias: assumptions made by model about target function to generalize from data

Image by Gordon Johnson on Pixabay.

Social bias in data exists. For example, a human reporting bias: The frequency with which people write about actions, outcomes, or properties is not a reflection of real-world frequencies or the degree to which a property is characteristic of a class of individuals (Gordon and Van Durme, 2013).

In addition, textual data contains other types of bias, reflecting historical unfairness, implicit stereotypes or prejudices, just to name a few.

During data collection and annotation, further bias is introduced to the data sets due to sampling errors or bias of the annotators themselves. This may include, for example, the so-called confirmation bias, which refers to annotators marking up what they expect to see.

Statistical bias in ML does not only arise due to biased training data.

Inductive bias: Each learning algorithm makes certain assumptions to learning the target function. The algorithm can only possibly learn a solution in that (model) space.

References and Further Reading:

Christopher Dwyer. 12 Common Biases That Affect How We Make Everyday Decisions. Psychology Today, 2018.
Gordon and Van Durme. Reporting bias and knowledge acquisition. In: AKBC '13: Proceedings of the 2013 workshop on Automated knowledge base construction. October 2013.
EMNLP 2019 tutorial on fairness and bias in NLP

Image Source:

Image by Gordon Johnson on Pixabay.

19 of 55

What is Bias? (Technical View)

Bias in machine learning

Bayesian probabilities: prior

May be intended (e.g., domain adaptation) or unintended

Is bias always a bad thing?

What is Bias?

Bias is part of the basis of the foundations of machine learning. In linear models, we can shift the predicted outcome depends on the input x multiplied by some factor a shifted by the “bias” term b. If the value of x is 0, the model will predict b as a score. In the context of Bayesian conditional probabilities, we compute the probability of even A occurring given that we know that some event B has occurred as the probability of B given that A has occurred times the prior probability P(A) that A occurs. This P(A) term, the probability of A occurring without any further information, is the bias of our model in this case.

20 of 55

Why is Bias Problematic? (Social View)

NLP Applications

Employment matching, advertisement placement, parole decisions, search, chatbots, face recognition, ...

Social Stereotypes

Gender, Race, Disability, Age, Sexual orientation, Culture, Class, Poverty, Language, Religion, National origin, ...

Sap et al.: The Risk of Racial Bias in Hate Speech Detection. ACL 2019.

Bias creeps into data/algorithms/models and leads to inequitable behaviors of the systems, systematically and unfairly discriminating certain individuals or social groups. Collection / annotation of training data → model training → ranking/filtering/aggregation/generation of media → people see output.

Example: Sap et al.: The Risk of Racial Bias in Hate Speech Detection.

The image shows phrases in African American English (AAE), their non-AAE equivalents (from Spears, 1998), and

toxicity scores from PerspectiveAPI.com. Perspective is a tool from Jigsaw/Alphabet that uses a convolutional neural network to detect toxic language, which refers to rude, disrespectful, or unreasonable comments. The tool has been trained on crowdsourced data where annotators were asked to label the toxicity of text without metadata. Sap et al. found that tweets written in AAE were more likely to be classified as hate speech or abusive language than tweets written in standard American English. How does this lead to a disadvantage / problem for African American Twitter users?

Protected Characteristics: In the UK, it is against the law to discriminate against someone because of: age, disability, gender reassignment, marriage and civil partnership, pregnancy and maternity, race, religion or belief, sex, or sexual orientation.

What are protected characteristics in the law of your home country / country of residence?

Australia: https://www.fairwork.gov.au/employee-entitlements/protections-at-work/protection-from-discrimination-at-work

European Union: https://ec.europa.eu/info/aid-development-cooperation-fundamental-rights/your-rights-eu/know-your-rights/equality/non-discrimination_en

Germany: https://de.wikipedia.org/wiki/Allgemeines_Gleichbehandlungsgesetz

US: https://www.law.cornell.edu/wex/protected_characteristic

UK: https://www.equalityhumanrights.com/en/equality-act/protected-characteristics

References:

Partially based on CMU course by Yulia Tsvetkov.

Sap et al. The Risk of Racial Bias in Hate Speech Detection. In Proceedings of ACL 2019.

21 of 55

Why is Bias Problematic?

Outcome Disparity

Error Disparity

Word Error Rate in automatic captioning is higher for female speakers compared to male speakers (Tatman, 2017)..

Because a “COOKING” event is taking place, the model is more likely to predict the agent to be a woman.

(Zhao et al., 2017)

Image sources: https://www.aclweb.org/anthology/W17-1606.pdf,

https://www.aclweb.org/anthology/D17-1323.pdf

women

men

22 of 55

Why is Bias Problematic?

(Technical View)

Outcome / Error disparity

Models might amplify bias

51:49 distribution in a feature may lead to 100:0 decision

Is it wrong to build models replicating “real world data”?

In what circumstances?

23 of 55

Sources of Bias in NLP (Shah et al., 2020)

Image Source: https://www.aclweb.org/anthology/2020.acl-main.468.pdf

24 of 55

De-Biasing of Word Embeddings

she

neutral

Image taken from Bolukbasi et al.,

NIPS 2016.

Example: Bolukbasi et al.: De-Biasing of Word Embeddings.

The study is on Word2Vec (Mikolov et al.) embeddings trained on Google News. They find that the word embeddings exhibit female/male stereotypes to a disturbing extent. They propose a solution to de-bias the vector space using transformations as follows. “Selected words projected along two axes: x is a projection onto the difference between the embeddings of the words he and she, and y is a direction learned in the embedding that captures gender neutrality, with gender neutral words above the line and gender specific words below the line. Our hard debiasing algorithm removes the gender pair associations for gender neutral words. In this figure, the words above the horizontal line would all be collapsed to the vertical line.”

It has been found that de-biasing techniques such as the one proposed by Bolukbasi et al. come at the expense of decreased accuracy for some applications, because they might also remove associations that are desired in some cases (e.g., if a word can take on more than one meaning).

Discussion questions: Is it still worthwhile doing? In which circumstances?

References:

Bolukbasi et al. Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings. NIPS 2016.
T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In NIPS.

25 of 55

Bias Exercise

Source: http://wordbias.umiacs.umd.edu

26 of 55

Source of Harm - Unfair Outcomes

NLP System

filtering

job applications

Better chances for people living in a certain area

27 of 55

Fairness

Treating everyone equally is fair, right?

So, everyone gets the same grade from now on ;)

fundamental principle of justice “equals should be treated equally and unequals unequally”

Bild von Gordon Johnson auf Pixabay

[We first try to informally introduce fairness]

Discuss what grades are trying to express and why treating everyone the same when it comes to grades is not fair

References

https://www.scu.edu/ethics/ethics-resources/ethical-decision-making/justice-and-fairness/

“The most fundamental principle of justice—one that has been widely accepted since it was first defined by Aristotle more than two thousand years ago—is the principle that "equals should be treated equally and unequals unequally." In its contemporary form, this principle is sometimes expressed as follows: "Individuals should be treated the same, unless they differ in ways that are relevant to the situation in which they are involved." For example, if Jack and Jill both do the same work, and there are no relevant differences between them or the work they are doing, then in justice they should be paid the same wages. And if Jack is paid more than Jill simply because he is a man, or because he is white, then we have an injustice—a form of discrimination—because race and sex are not relevant to normal work situations.”

Image Source:

Bild von Gordon Johnson auf Pixabay, see also: https://en.wikipedia.org/wiki/Lady_Justice

28 of 55

Group vs. Individual Fairness

group fairness

errors should be distributed similarly across protected groups

individual fairness

similar individuals should be treated similarly regardless of group membership

cannot reach group and individual fairness at the same time

Which groups are/should be protected?

How can we measure similarity of individuals?

29 of 55

https://dl.acm.org/doi/10.1145/3442188.3445892

https://arxiv.org/abs/1609.07236

30 of 55

https://medium.com/ibm-watson/ethics-in-ai-responsibilities-for-data-analysts-part-2-d76f2343e4d1

31 of 55

Source of Harm - Input/Training Data

Data

NLP System

32 of 55

Privacy

“I’ve got nothing to hide.”

Do you have curtains? / Do you close your shutters at night?

Can I see your credit card bills from last year?

33 of 55

A Taxonomy of Privacy (Solove, 2007)

Privacy = intimacy?

Privacy = the right to be let alone?

Problems and harms related to privacy

“Privacy [...] is a plurality of different things that do not share one element in common but that nevertheless bear a resemblance to each other.”

34 of 55

Data Privacy Regulations

European Regulation 2016/679

General Data Protection Regulation (GDPR)

Main rights of the “data subject” (natural person):

Right of access
Right of rectification
Right to erasure (“right to be forgotten”)
Right to withdraw consent at any time
Right to lodge a complaint with a supervisory authority
Right to restriction of processing
Right to data portability

Similar laws in the US: California Consumer Privacy Act

Applies for the data of all EU citizens - also if controller operates from a country outside the EU!

35 of 55

Data Privacy vs. Data Ethics

Data privacy is responsibly collecting, using and storing data about people, in line with the expectations of those people, customers, regulations and laws.
Data ethics is doing the right thing with data, considering the human impact from all sides, and making decisions based on your values.

[based on: Lawler, 2019]

“Just because we can do something, doesn’t mean we should.”

Should a company sell user information to political campaigns?

“Ethics comes before, during and after the law. It informs how laws are drafted, interpreted, and revised. It fills the gaps where the law appears to be silent.” (Giovanni Buttarelli, European Data Protection and Privacy Commissioner 2014-2019)

Summary of Fabiano paper: Ethical reading of the GDPR

“Personal data” = belonging to a natural person. Ultimate goal: protect dignity of human being.

What is the value of personal data? Can be of economic / financial or ethical relevance. Every opportunity to take advantage of something brings out it economic value.

Protection of personal data = preserving a natural person from the misuse of his/her personal information.

However, the GPDR doesn’t lay down any specific law on ethics.

Consciousness of ethics entails complying with the law (data protection law = GDPR), but respecting the law does not mean to always act ethically.

Example: Technical staff detects a potential data breach bug shortly before deployment. Legal consideration might evaluate risk as low and they might continue with the deployment. An ethics approach would mean to first fix the bug and then continue with the deployment.

References:

https://looker.com/blog/big-data-ethics-privacy

Nicola Fabiano, Ethics and the protection of personal data. IMCIC 2019.

36 of 55

Anonymization (De-Identification)

After having run some anonymization system on our data, is everything fine?

Image Source: https://www.aclweb.org/anthology/2020.lrec-1.870/

HitzalMed

(Lopez et al., 2020)

37 of 55

Authorship Attribution / Author Profiling

What are potential chances and risks of this type of technology?

Even if the task is not to explicitly trace the author’s identity or profile, issues of non-anonymity may arise. In certain circumstances, a model may leak information even if dataset doesn’t (Dwork & Mulligan, 2013). Simply removing, e.g., a race attribute does not do the trick as this information may correlate with other features such as ZIP code and gender/browsing history.

References:

Francisco Rangel, Paolo Rosso, Martin Potthast, and Benno Stein. Overview of the 5th Author Profiling Task at PAN 2017: Gender and Language Variety Identification in Twitter. In Linda Cappellato, Nicola Ferro, Lorraine Goeuriot, and Thomas Mandl, editors, CLEF 2017 Evaluation Labs and Workshop – Working Notes Papers, 11-14 September, Dublin, Ireland, September 2017. CEUR-WS.org.

Dwork & Mulligan. It’s Not Privacy, and It’s Not Fair. Stanford Law Review, 2013

38 of 55

Reading Assignment / Discussion

Daniel J. Solove. 'I've Got Nothing to Hide' and Other Misunderstandings of Privacy. San Diego Law Review, Vol. 44, p. 745, 2007

Germany’s complicated relationship with Google Street View. NY Times, April 2013.

Questions to think about / discuss:

Which dimensions of privacy matter most to you?

A software developer accidentally notices a document where a user is drafting a suicide note. Should he/she contact the police to save a life, or respect their user’s secret?

Can you imagine a situation where interfering with someone’s privacy leads to an economic / financial issue for that person?

39 of 55

Word Error Rates for Automatic Captioning

on YouTube (Tatman, 2017)

WER higher for Scottish speakers

WER higher for female speakers compared to male speakers

Image Source: https://www.aclweb.org/anthology/W17-1606

Example from Prabhumoye et al.: (This paper contains detailed explanations!)

Tatman (2017): Youtube’s automatic captioning system has lower accuracy for women and speakers from Scotland. Why is that not okay (according to the different ethical theories)?

Violates utilitarian principle: utility of speech recognition lower for particular sub-groups of intended users (women and Scottish people)
Violates the generalization principle: incorporates superfluous information about speakers in the representation space of the models

References:

Rachael Tatman. Gender and Dialect Bias in YouTube’s Automatic Captions. In Proceedings of the First ACL Workshop on Ethics in Natural Language Processing.
Prabhumoye et al.: Principled Frameworks for Evaluating Ethics in NLP Systems. In Proceedings of the Workshop on Widening NLP. 2019.

40 of 55

Demographic Factors Improve Classification Performance (Hovy, 2015)

Distribution of categories by gender

Is it okay to leverage the author’s gender information as explicit features for text classification?

What would be recommended from a utilitarian / generalization perspective?

Image Source: https://www.aclweb.org/anthology/P15-1073.pdf

x/Axis: Topics

41 of 55

Reading Assignment

Prabhumoye et al.: Case Study: Deontological Ethics in NLP. NAACL 2021.

Consider a project that you are working on / have worked on or pick a recent research paper from the ACL Anthology. Analyse the method / system both from a utilitarian and from a generalization perspective. How would scholars of each ethical theory evaluate the ethicality of the method/system?

42 of 55

“Applications”: NLP for Social Good

Civility in communication: techniques to monitor trolling, hate speech, abusive language, detect fake news, etc.

Image Source: https://www.stiftung-nv.de/de/publikation/kurzanalyse-zu-trumps-crime-tweet-deutschland-viel-aufmerksamkeit-wenig-unterstuetzung

43 of 55

44 of 55

Reading Suggestions:

Environmental Issues

Bender et al. On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?. FAccT ’21, March 3–10, 2021, Virtual Event, Canada

Emma Strubell, Ananya Ganesh, Andrew McCallum, Energy and Policy Considerations for Deep Learning in NLP. ACL 2019.

Bild von Peggy und Marco Lachmann-Anke auf Pixabay

45 of 55

Misc / Practical Hints

IRB = Institutional Review Board (Ethics Review Board)

Reviews all human experimentation

Find out how to contact the IRB of your institution.

The ACL has adopted ACM Code of Ethics and Professional Conduct and published an FAQ with hints on conducting research and publishing in an ethical manner (see, e.g., ACL-IJCNLP 2021 Ethics FAQ).

Association for Computational Linguistics

46 of 55

Practical summary

Analyse data, task and outcomes for potential harm.

Can benefits outweigh harms?

47 of 55

Retrospection

What have you learned?

What does that mean for you personally?

What was surprising?

48 of 55

References

CMU Course on Computational Ethics for NLP

Stanford Course on Ethics in NLP

49 of 55

Literature – Ethics in NLP

Overviews

Dignum, V. (2019). Responsible artificial intelligence: how to develop and use Ai in a responsible way. Cham, Switzerland: Springer.

Fort, K., & Couillault, A. (2016). Yes, We Care! Results of the Ethics and Natural Language Processing Surveys. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16). Retrieved from https://www.aclweb.org/anthology/L16-1252

Hovy, D., & Spruit, S. L. (2016). The Social Impact of Natural Language Processing. In K. Erk & N. A. Smith (Chairs), Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Berlin, Germany. Retrieved from https://www.aclweb.org/anthology/P16-2096.pdf

50 of 55

Literature – Ethics in NLP

Overviews

Leidner, J. L., & Plachouras, V. (2017). Ethical by Design: Ethics Best Practices for Natural Language Processing. In D. Hovy, S. Spruit, M. Mitchell, E. M. Bender, M. Strube, & H. Wallach (Chairs), Proceedings of the First ACL Workshop on Ethics in Natural Language Processing. Retrieved from https://www.aclweb.org/anthology/W17-1604.pdf

Mittelstadt, B. D., Allo, P., Taddeo, M., Wachter, S., & Floridi, L. (2016). The ethics of algorithms: Mapping the debate. Big Data & Society, 3(2), 205395171667967. Retrieved from �https://journals.sagepub.com/doi/pdf/10.1177/2053951716679679

Zweig, K. A. (2019). Ein Algorithmus hat kein Taktgefühl: wo Künstliche Intelligenz sich irrt, warum uns das betrifft und was wir dagegen tun können. München: Heyne.

51 of 55

Literature – Ethics in NLP

Bias

Tatman, R. (2017). Gender and Dialect Bias in YouTube's Automatic Captions. In D. Hovy, S. Spruit, M. Mitchell, E. M. Bender, M. Strube, & H. Wallach (Chairs), Proceedings of the First ACL Workshop on Ethics in Natural Language Processing. Retrieved from https://www.aclweb.org/anthology/W17-1606.pdf

Prates, M. O. R., Avelar, P. H., & Lamb, L. C. (2019). Assessing gender bias in machine translation: A case study with Google Translate. Neural Computing and Applications, 14(1). Retrieved from https://arxiv.org/pdf/1809.02208.pdf

Stanovsky, G., Smith, N. A., & Zettlemoyer, L. (2019). Evaluating Gender Bias in Machine Translation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Retrieved from https://www.aclweb.org/anthology/P19-1164.pdf

52 of 55

Literature – Ethics in NLP

Bias

Davidson, T., Bhattacharya, D., & Weber, I. (2019). Racial Bias in Hate Speech and Abusive Language Detection Datasets. In Proceedings of the Third Workshop on Abusive Language Online (pp. 25–35). Association for Computational Linguistics. Retrieved from https://www.aclweb.org/anthology/W19-3504.pdf

Sap, M., Card, D., Gabriel, S., Choi, Y., & Smith, N. A. (2019). The Risk of Racial Bias in Hate Speech Detection. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (pp. 1668–1678). Association for Computational Linguistics. Retrieved from https://www.aclweb.org/anthology/P19-1163.pdf

53 of 55

Literature – Ethics in NLP

Bias

Bolukbasi, T., Chang, K.-W., Zou, J., Saligrama, V., & Kalai, A. (2016). Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings. In Proceedings of the 30th International Conference on Neural Information Processing Systems. Retrieved from http://papers.nips.cc/paper/6228-man-is-to-computer-programmer-as-woman-is-to-homemaker-debiasing-word-embeddings.pdf

Caliskan, A., Bryson, J. J., & Narayanan, A. (2017). Semantics derived automatically from language corpora contain human-like biases. Science (New York, N.Y.), 356(6334), 183–186. Retrieved from https://arxiv.org/pdf/1608.07187.pdf

Sheng, E., Chang, K.-W., Natarajan, P., & Peng, N. (2019). The Woman Worked as a Babysitter: On Biases in Language Generation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) (pp. 3405–3410). Association for Computational Linguistics. Retrieved from https://www.aclweb.org/anthology/D19-1339.pdf

Zhao, J., Wang, T., Yatskar, M., Ordonez, V., & Chang, K.-W. (2017). Men Also Like Shopping: Reducing Gender Bias Amplification using Corpus-level Constraints. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (pp. 2979–2989). Association for Computational Linguistics. Retrieved from https://www.aclweb.org/anthology/D17-1323.pdf

54 of 55

Literature – Ethics in NLP

Fairness

Loukina, A., Madnani, N., & Zechner, K. (2019). The many dimensions of algorithmic fairness in educational applications. In Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications (pp. 1–10). Association for Computational Linguistics. Retrieved from https://www.aclweb.org/anthology/W19-4401.pdf

55 of 55

Literature – Ethics in NLP

Gender Stereotypes

Bhaskaran, J., & Bhallamudi, I. (2019). Good Secretaries, Bad Truck Drivers? Occupational Gender Stereotypes in Sentiment Analysis. In Proceedings of the First Workshop on Gender Bias in Natural Language Processing (pp. 62–68). Association for Computational Linguistics. Retrieved from https://www.aclweb.org/anthology/W19-3809.pdf