1 of 61

CMPSC 442: Artificial Intelligence

Lecture 16. AI Ethics

Rui Zhang

Spring 2024

1

2 of 61

3 of 61

How should AI systems behave, and who should decide?

3

https://openai.com/blog/how-should-ai-systems-behave/

4 of 61

The Human Factor in NLP

"The common misconception is that language has to do with words and what they mean. It doesn’t. It has to do with people and what they mean."

--- Herbert H. Clark & Michael F. Schober, 1992

4

5 of 61

Harm caused by Bias of NLP Technology

5

6 of 61

Harm caused by Bias of NLP Technology

6

https://www.theguardian.com/technology/2017/oct/24/facebook-palestine-israel-translates-good-morning-attack-them-arrest

7 of 61

Gender Bias in Word Embeddings

7

8 of 61

Gender Bias in Text-to-Image Retrieval

Image search query “Doctor” (June 2017)

8

Slide Credit: Yulia Tsvetkov

9 of 61

Gender Bias in Text-to-Image Retrieval

Image search query “Nurse” (June 2017)

9

Slide Credit: Yulia Tsvetkov

10 of 61

Gender Bias in Machine Translation

10

https://arxiv.org/pdf/1809.02208.pdf

11 of 61

Gender Bias in Machine Translation

Google Translation systems: gender neutral Turkish sentences into English

11

https://blog.google/products/translate/reducing-gender-bias-google-translate/

12 of 61

Social/Racial Bias in NLG of Dialog Systems

12

https://aclanthology.org/2020.findings-emnlp.291.pdf

13 of 61

Human Bias in Data

Human Reporting Bias

  • The frequency with which people write about actions, outcomes, or properties is not a reflection of real-world frequencies or the degree to which a property is characteristic of a class of individuals.
  • e.g., "Doctor" vs "Female Doctor"
  • e.g., "Banana" vs "Yellow Banana"

13

14 of 61

Human Bias in Data Collection and Annotation

Selection Bias

  • Selection does not reflect a random sample

14

http://turktools.net/crowdsourcing/

https://ai.googleblog.com/2018/09/introducing-inclusive-images-competition.html

15 of 61

Inductive Bias

The assumptions used by our model

  • recurrent neural networks for NLP assume that the sequential ordering of words is meaningful
  • features in discriminative models are assumed to be useful to map inputs to outputs

15

https://people.cs.umass.edu/~miyyer/cs685_f20/slides/18-ethics.pdf

16 of 61

Bias Amplification in Learned Models

Dataset Gender Bias

16

Slide Credit: Mark Yatskar

Model Bias After Training

17 of 61

Human Bias in Interpretation

Confirmation bias: The tendency to search for, interpret, favor, recall information in a way that confirms preexisting beliefs.

Overgeneralization: Coming to conclusion based on information that is too general and/or not specific enough (related: overfitting).

Correlation Fallacy: Confusing correlation with causation.

Automation Bias: Propensity for humans to favor suggestions from automated decision-making systems over contradictory information without automation.

17

Slide Credit: Margaret Mitchell

18 of 61

Algorithmic bias: Unjust, unfair, or prejudicial treatment of people related to race, income, sexual orientation, religion, gender, and other characteristics historically associated with discrimination and marginalization, when and where they manifest in algorithmic systems or algorithmically aided decision-making.

18

Data Collection

Data Annotation

Model Training

Result Interpretation

Human Bias

19 of 61

Understand and Document your Data

19

https://arxiv.org/pdf/1803.09010.pdf

20 of 61

Also Be Responsible for your Model

20

https://arxiv.org/pdf/1810.03993.pdf

21 of 61

AI Ethics

Fairness and Bias

Security and Privacy

Transparency and Explainability

22 of 61

Appendix

23 of 61

Fair Abstractive Summarization of Diverse Perspectives

Yusen Zhang, Nan Zhang, Yixin Liu, Alexander Fabbri, Junru Liu, Ryo Kamoi

Xiaoxin Lu, Caiming Xiong, Jieyu Zhao, Dragomir Radev, Kathleen McKeown, Rui Zhang

NAACL 2024

23

24 of 61

Are Large Language Models

Fair Summarizers?

24

25 of 61

Conflicting Product Reviews

25

26 of 61

Diverse Perspectives and Conflicting Opinions

26

Product and Restaurant Reviews

Political Stances

Legal Cases

Scientific Debates

27 of 61

Value Pluralism and Fairness of Summarization

Value Pluralism: There are several values which may be equally correct and fundamental, and yet in conflict with each other.

Fair Summarization: A fair summary for user-generated data by providing an accurate and comprehensive view of various perspectives from these groups.

27

28 of 61

PerspectiveSumm: A Benchmark for Fair Abstractive Summarization

Characteristics

  • Quality: Human-written inputs marked with clear, precise social attributes
  • Diversity: Cover various domains, forms, attributes, and values

28

29 of 61

PerspectiveSumm: Examples of Claritin and US Election

29

30 of 61

PerspectiveSumm: Examples of Yelp and Amazon

30

31 of 61

PerspectiveSumm: Examples of Supreme Court and IQ2 Debate

31

32 of 61

Summarization of Diverse Perspectives with Social Attributes

Social attributes: indicate the properties that form groups of people.

  • Sentiment: Positive, Negative
  • Gender: Male, Female, Other
  • Party: Conservatism, Liberalism, Moderates

32

Source 1: Positive Review

Source 2: Negative Review

Target

Positive Summary

Negative Summary

Neutral Summary

33 of 61

Definition of Fairness of Summarization

33

Fair Summary

Unfair Summary

34 of 61

Probing Fairness of LLMs through Summarization

34

Fair Summary

Unfair Summary

35 of 61

Existing Metrics are not Enough for Evaluating Fairness

  • We do not always have reference summaries.
  • Even if reference summaries are available, they are not always fair. (Actually, they are often not fair according to our experiments later.)
  • Even if the summaries are fair, existing metrics (ROUGE/BLEU/BERTScore) captures similarity but cannot capture the notion of fairness.

35

36 of 61

Our Approach to Quantifying Fairness of Summaries

36

1. Quantify the distribution of values in both sources and targets.

2. Quantify the differences of value distributions between sources and targets.

37 of 61

Our Approach to Quantifying Fairness of Summaries

37

1. Quantify the distribution of values in both sources and targets.

2. Quantify the differences of value distributions between sources and targets.

38 of 61

Value Distribution of Text

We view text as a probability distribution of semantic units, e.g., tokens.

Each semantic unit maps to social attribute values.

This gives us value distribution of text!

38

39 of 61

Value Distribution of Source

39

Source

Positive Review

Negative Review

Source Value Distribution

Source Text

40 of 61

Value Distribution of Source

40

This is easy as the meta-data already has the values.

So we can count the number of tokens of each values.

41 of 61

Value Distribution of Target

41

Target

Positive Summary

Negative Summary

Neutral Summary

Target Value Distribution

Target Text

42 of 61

Value Distribution of Target

This is not easy due to the abstractive nature of summaries!

We explore two methods for estimating

  1. N-gram Matching: find n-gram overlap between target and source for hard matching
  2. Neural Matching: use BERTScore/BARTScore for soft matching

42

43 of 61

Our Approach to Quantifying Fairness of Summaries

43

1. Quantify the distribution of values in both sources and targets.

2. Quantify the differences of value distributions between sources and targets.

44 of 61

Summarization Fairness - Ratio Fairness

44

The target value distribution should follow the source value distribution.

Source 1: Positive Review

Source 2: Negative Review

Target

Positive Summary

Negative Summary

Neutral Summary

45 of 61

Summarization Fairness - Equal Fairness

45

Target

Positive Summary

Negative Summary

Neutral Summary

The target value distribution should follow the uniform value distribution, regardless of the source.

Source 1: Positive Review

Source 2: Negative Review

46 of 61

Summarization Fairness - User-Defined Fairness

46

Target

The target value distribution should follow user-defined distribution.

Source 1: Positive Review

Source 2: Negative Review

47 of 61

Metric 1 - Binary Unfair Rate (BUR)

Definition.

Binary Unfair Rate (BUR) outputs 1 if the sample is unfair; and 0 otherwise.

A summary is fair if and only if

This means no value is under-represented.

47

48 of 61

Metric 2 - Unfair Error Rate (UER)

Definition.

Unfair Error Rate (UER) measures the distance between value distributions of sources and targets.

It computes the average percentage of values that are underrepresented.

48

49 of 61

Sanity Check on Our Metric Quality by Extreme Synthetic Examples

We create pseudo-summary by sampling from the source

  • Biased Summary: Sample only male tweets
  • Balanced Summary: Sample both male and female tweets with balanced ratios

49

Our metrics do capture the difference of value distributions to measure fairness.

50 of 61

Sanity Check on Our Metric Quality by Human Evaluation

We perform a two-stage human evaluation to understand how humans perceive the fairness of abstractive summaries.

50

  1. Sentence Fact Identification

2. Summary Fairness Identification

51 of 61

Sanity Check on Our Metric Quality by Human Evaluation

51

High correlation of proposed metrics and human evaluation.

52 of 61

How fair are the abstractive summaries generated by LLMs?

52

  • Many summaries generated by LLMs are not fair, as judged our automatic metrics.
  • gpt-turbo-3.5 and gpt-4 are in general better than their older version text-davince-003 and other small open-source models.
  • But we don’t find strong evidence that gpt-4 is better than gpt-turbo-3.5.

53 of 61

How fair are the abstractive summaries generated by LLMs?

53

While the summary can be unfair per instance, on the entire testing set, models do not generate more unfair summaries for one side than the other.

54 of 61

How do humans perceive the fairness of abstractive summaries?

54

Our results indicate that many summaries generated by LLMs are not fair, as judged our human evaluators.

55 of 61

How fair are the existing human-written reference summaries?

55

Interestingly, existing reference summaries are not fair either, even worse than LLM-generated summaries.

56 of 61

How can we improve fairness of abstractive summarization?

We experimented with three simple ways without modifying LLMs themselves

  • Improving through Instructions
  • Improving through Hyperparameters
    • Summary Length
    • Decoding Temperature

56

57 of 61

How can we improve fairness of abstractive summarization?

57

58 of 61

How can we improve fairness of abstractive summarization?

58

Instruction Prompting only Alleviates, but does not eliminate, the issue.

59 of 61

How can we improve fairness of abstractive summarization?

59

Medium Summary Length is the Best. When there are too many/fewer sentences, balancing the value in summary is more difficult.

60 of 61

How can we improve fairness of abstractive summarization?

60

Higher Decoding Temperature Helps because it allows more diverse generation to improve fairness.

61 of 61

Conclusions and Future Work

  • Define fair abstractive summarization over diverse perspectives.
  • Collect a dataset PerspectiveSumm over various domains and social attributes.
  • Quantify the fairness of abstractive summarization by proposing several new metrics.
  • Benchmark LLMs to show that many summaries generated by LLMs are not fair, as judged by both our metrics and human evaluators.
  • Future work: While some simple methods can help, we need fundamental solutions: fairness-aware RLHF, fine-grained controllable generation, and complex-instruction-following capability.

61