Understanding The Relation Between Noise And Bias In Annotated Datasets
Abhishek Anand, Anweasha Saha, Prathyusha Naresh Kumar,
Ashwin Rao, Zihao He, Negar Mokhberian
Information Sciences Institute
Table of Contents
2
Information Sciences Institute
Motivation
3
Information Sciences Institute
Motivation
4
Information Sciences Institute
Motivation
5
Information Sciences Institute
Datasets
6
| Toxicity or Hate speech | |||
SBIC [1] | Kennedy [2] | Agree To Disagree [3] | ||
# Annotators | 307 | 7,912 | 819 | |
# Annotations per annotator | 479.3±829.6 | 17.1±3.8 | 63.7±139 | |
# Unique texts | 45318 | 39,565 | 10,440 | |
# Annotations per text | 3.2±1.2 | 2.3±1.0 | 5 | |
# Number of labels | 2 | 3 | 2 | |
Information Sciences Institute
Methods
7
Information Sciences Institute
Two Regimes of Classification
Information Sciences Institute
Two Regimes of Classification
[6]
Information Sciences Institute
Model and Performance
10
Model
Trained 2 models for each dataset
Majority Label Model | Multi-annotator model |
Model - Roberta-Base [6] Epochs - 5 Learning Rate - 5e-5 Batch size - 32 | Model - DISCO [7] Epochs - 5 Learning Rate - 2e-3 Batch size - 200 |
Information Sciences Institute
Model and Performance
11
Dataset | F1 score (majority) | F1 score (multi-annotator) |
Agree To Disagree | 0.78 | 0.78 |
Kennedy | 0.68 | 0.75 |
SBIC | 0.80 | 0.78 |
Model
Trained 2 models for each dataset
Majority Label Model | Multi-annotator model |
Model - Roberta-Base [6] Epochs - 5 Learning Rate - 5e-5 Batch size - 32 | Model - DISCO [7] Epochs - 5 Learning Rate - 2e-3 Batch size - 200 |
Performance
Information Sciences Institute
Uncertainty in Machine Learning Predictions
12
Information Sciences Institute
Uncertainty in Machine Learning Predictions
13
Information Sciences Institute
Uncertainty in Machine Learning Predictions
14
Information Sciences Institute
Results - Dataset Cartography
Agree To Disagree
15
Information Sciences Institute
Results - Dataset Cartography
Agree To Disagree
Kennedy
16
Information Sciences Institute
Results - Dataset Cartography
SBIC
17
Information Sciences Institute
1st Regime of Classification
Information Sciences Institute
Results - Single Ground Truth Model
Agree To Disagree
19
Information Sciences Institute
Results - Single Ground Truth Model
Agree To Disagree
20
Information Sciences Institute
Results - Single Ground Truth Model
Agree To Disagree
21
Pearson’s R | P-value |
0.44 | 0.0 |
Information Sciences Institute
Results - Single Ground Truth Model
Kennedy
22
Information Sciences Institute
Results - Single Ground Truth Model
Kennedy
23
Information Sciences Institute
Results - Single Ground Truth Model
Kennedy
24
Pearson’s R | P-value |
0.45 | 0.0 |
Information Sciences Institute
Results - Single Ground Truth Model
SBIC
25
Information Sciences Institute
Results - Single Ground Truth Model
SBIC
26
Information Sciences Institute
Results - Single Ground Truth Model
SBIC
27
Pearson’s R | P-value |
0.37 | 0.0 |
Information Sciences Institute
Findings from Single Ground Truth model
Information Sciences Institute
Findings from Single Ground Truth model
Information Sciences Institute
2nd Regime of Classification
Research Question: Does modeling the task based on each annotator vote help with getting better confidence for the low-confidence instances?
Information Sciences Institute
Results - Multi Annotator Model
Agree To Disagree
31
Information Sciences Institute
Results - Multi Annotator Model
Agree To Disagree
32
Information Sciences Institute
Results - Multi Annotator Model
Agree To Disagree
33
Pearson’s R | P-value |
0.46 | 0.0 |
Correlation between agreement factor and confidence
Information Sciences Institute
Results - Multi Annotator Model
SBIC
34
Information Sciences Institute
Results - Multi Annotator Model
SBIC
35
Information Sciences Institute
Results - Multi Annotator Model
SBIC
36
Pearson’s R | P-value |
0.43 | 0.0 |
Correlation between agreement factor and confidence
Information Sciences Institute
Results - Multi Annotator Model
Kennedy
37
Information Sciences Institute
Results - Multi Annotator Model
Kennedy
38
Information Sciences Institute
Results - Multi Annotator Model
Kennedy
39
Pearson’s R | P-value |
0.50 | 0.0 |
Correlation between agreement factor and confidence
Information Sciences Institute
Results - Multi Annotator Model
Kennedy
40
Pearson’s R | P-value |
0.50 | 0.0 |
This dataset is extra challenging because avg number of annotations per annotator is ~ 17!
Correlation between agreement factor and confidence
Information Sciences Institute
Results - Multi-Annotator Model
High Confidence in Single Ground Truth model and Low Confidence in Multi-Annotator model
41
Text | Annotation |
I thought you were going to win this in a landslide | Offensive |
crossing my fingers for you from Germany. Even though I am not an American. | Offensive |
boy F**K you | Not Offensive |
Fragility at its finest | Offensive |
Information Sciences Institute
Results - Multi-Annotator Model
High Confidence in Single Ground Truth model and Low Confidence in Multi-Annotator model
42
Text | Annotation |
I thought you were going to win this in a landslide | Offensive |
crossing my fingers for you from Germany. Even though I am not an American. | Offensive |
boy F**K you | Not Offensive |
Fragility at its finest | Offensive |
Mislabeled annotations!
Information Sciences Institute
Findings from Multi-annotator model
43
Information Sciences Institute
Findings from Multi-annotator model
44
Information Sciences Institute
Findings from Multi-annotator model
45
Information Sciences Institute
References
46
Information Sciences Institute