Tutorial on
Combating Online Hate Speech:
The Role of Content, Networks, Psychology, User Behavior, etc.
Our Team
Sarah Masud IIIT-D, India
Pinkesh Badjatiya Microsoft, India
Amitava Das Wipro, India
Manish Gupta Microsoft, India
Tanmoy Chakraborty
IIIT-D, India
Preslav Nakov
QCRI, Qatar
Available at: https://hatewash.github.io/#outline
Tutorial Outline
Why Study Hate Speech?
Various Forms of Malicious Online Content
CyberBullying
Abuse
Profanity
Offense
Aggression
Provocation
Toxicity
Spam
FakeNews
Rumours
HateSpeech
Trolling
Personal Attacks
Fraud
[1] https://pubmed.ncbi.nlm.nih.gov/15257832/
Statistics of Hate Speech Prevalence
1134 Americans surveyed from Dec 17, 2018 to Dec 27, 2018
Anti-Defamation League https://www.adl.org/onlineharassment
Percentage of U.S. Adults Who Have Experienced Harassment Online
Reasons for Online Hate
Percentage of Respondents Who Were Targeted Because of Their Membership in a Protected Class
Hate speech on Internet is an age old problem
Fig 1: https://en.wikipedia.org/wiki/Controversial_Reddit_communities
Fig 2: https://www.youtube.com/watch?v=1ndq79y1ar4
Fig 3:
https://theconversation.com/hate-speech-is-still-easy-to-find-on-social-media-1060 20
Fig 4: https://twitter.com/AdhirajGabbar/status/1348145356282884097
Fig : List of Extremist/Controversial SubReddits
Fig4: Twitter Offensive Speech
Fig3: Twitter hate Speech
Fig 2: Youtube Video Incident to Violence and Hate Crime
Internet’s policy w.r.t curbing Hate
Some famous platforms with stricter
policies:
Flag Bearer of Free Speech (as a home for hate speech): Unmoderated platforms
[2]: Characterizing (Un)moderated Textual Data in Social Systems
Ill Effects of Hate Speech
Ill Effects of Hate Speech
1134 Americans surveyed from Dec 17, 2018 to Dec 27, 2018
Anti-Defamation League https://www.adl.org/onlineharassment
Harassment of Daily Users of Platforms
Impact of Online Hate and Harassment
Societal Impact of Online Hate and Harassment
Why is studying hate speech detection critical?
weeks of the event. An automated cyber hate classification system could support more proactive public order management in the first two weeks following an event.
https://l1ght.com/Toxicity_during_coronavirus_Report-L1ght.pdf
Fortuna, P., Nunes, S.: A survey on automatic detection of hate speech in text. ACM Computing Surveys (CSUR)51(4), 1–30 (2018)
Burnap, P., Williams, M.L.: Us and them: identifying cyber hate on twitter across multiple protected characteristics. EPJ Data science5, 1–15 (2016)
Definition of hate speech
ethnicity, national origin, religious affiliation, sexual
orientation, sex, gender, descent, or serious disability or disease.
derogatory, encouraging violence, or aims to
dehumanize (comparing people to non-human things,
e.g. animals), insult, promote or justify hatred, discrimination or hostility.
Badjatiya, Pinkesh, Gupta, S.,Gupta, Manish, Varma, Vasudeva: Deep learning for hate speech detection in tweets. In: Proceedings of the 26th international conference on World Wide Web companion. pp. 759–760 (2017) Bhardwaj, M., Akhtar, M.S., Ekbal, A.,Das, Amitava, Chakraborty, Tanmoy: Hostility detection dataset in hindi. arXiv preprint arXiv:2011.03588 (2020)
Davidson, T., Warmsley, D., Macy, M., Weber, I.: Automated hate speech detection and the problem of offensive language. In: Proc. of the Intl. AAAI Conf. on Web and Social Media. vol. 11 (2017) Fortuna, P., Nunes, S.: A survey on automatic detection of hate speech in text. ACM Computing Surveys (CSUR)51(4), 1–30 (2018)
Youtube, Facebook, Twitter
Kiela, D., Firooz, H., Mohan, A., Goswami, V., Singh, A., Ringshia, P., Testuggine, D.: The hateful memes challenge: Detecting hate speech in multimodal memes. Advances in Neural Information Processing Systems33(2020) MacAvaney, S., Yao, H.R., Yang, E., Russell, K., Goharian, N., Frieder, O.: Hate speech detection: Challenges and solutions. PloS one14(8), e0221152 (2019)
https://www.adl.org/sites/default/files/documents/pyramid-of-hate.pdf
Hate Speech Detection
Agenda
Popular social network datasets
Waseem, Zeerak, and Dirk Hovy. "Hateful symbols or hateful people? predictive features for hate speech detection on twitter." In Proceedings of the NAACL student research workshop, pp. 88-93. 2016.
Bhardwaj, M., Akhtar, M.S., Ekbal, A.,Das, Amitava, Chakraborty, Tanmoy: Hostility detection dataset in hindi. arXiv preprint arXiv:2011.03588 (2020)
Wijesiriwardene, Thilini, Hale Inan, Ugur Kursuncu, Manas Gaur, Valerie L. Shalin, Krishnaprasad Thirunarayan, Amit Sheth, and I. Budak Arpinar. "Alone: A dataset for toxic behavior among adolescents on twitter." In International Conference on Social Informatics, pp. 427-439. Springer, Cham, 2020.
Chandra, M., Pathak, A., Dutta, E., Jain, P.,Gupta, Manish, Shrivastava, M., Kumaraguru,P.: Abuseanalyzer: Abuse detection, severity and target prediction for gab posts. In: Proc. of the 28th Intl. Conf. on Computational Linguistics. pp. 6277–6283 (2020) Davidson, T., Warmsley, D., Macy, M., Weber, I.: Automated hate speech detection and the problem of offensive language. In: Proc. of the Intl. AAAI Conf. on Web and Social Media. vol. 11 (2017)
Other popular datasets
Homa Hosseinmardi, Sabrina Arredondo Mattson, Rahat Ibn Rafiq, Richard Han, Qin Lv, and Shivakant Mishra. 2015. Analyzing labeled cyberbullying incidents on the instagram social network. In Socinfo. Springer, 49–66. Rahat Ibn Rafiq, Homa Hosseinmardi, Richard Han, Qin Lv, Shivakant Mishra, and Sabrina Arredondo Mattson. 2015. Careful what you share in six seconds: Detecting cyberbullying instances in Vine. In ASONAM. ACM, 617–622 Zhong, H., Li, H., Squicciarini, A.C., Rajtmajer, S.M., Griffin, C., Miller, D.J., Caragea, C.:Content-driven detection of cyberbullying on the instagram social network. In: IJCAI. vol. 16,pp. 3952–3958 (2016)
Kiela, D., Firooz, H., Mohan, A., Goswami, V., Singh, A., Ringshia, P., Testuggine, D.: The hateful memes challenge: Detecting hate speech in multimodal memes. Advances in Neural Information Processing Systems33(2020) Gomez, R., Gibert, J., Gomez, L., Karatzas, D.: Exploring hate speech detection in multi-modal publications. In: Proc. of the IEEE/CVF Winter Conf. on Applications of Computer Vision. pp. 1470–1478 (2020)
Juuti, M., Gr ̈ondahl, T., Flanagan, A., Asokan, N.: A little goes a long way: Improving toxic language classification despite data scarcity. In: Proc. of the 2020 Conf. on Empirical Methods in Natural Language Processing: Findings. pp. 2991–3009 (2020)
Other popular datasets
Karlekar, S., Bansal, M.: Safecity: Understanding diverse forms of sexual harassment personal stories. In: Proc. of the 2018 Conf. on Empirical Methods in Natural Language Processing. pp. 2805–2811 (2018)
Nobata, C., Tetreault, J., Thomas, A., Mehdad, Y., Chang, Y.: Abusive language detection in online user content. In: Proc. of the 25th Intl. Conf. on world wide web. pp. 145–153 (2016)
Parikh, P., Abburi, H.,Badjatiya, Pinkesh, Krishnan, R., Chhaya, N.,Gupta, M.,Varma, Vasudeva: Multi-label categorization of accounts of sexism using a neural framework. In: Proc. of the 2019 Conf. on Empirical Methods in Natural Language Processing andthe 9th Intl. Joint Conf. on Natural Language Processing (EMNLP-IJCNLP).pp. 1642–1652 (2019)
Silva, L., Mondal, M., Correa, D., Benevenuto, F., Weber, I.: Analyzing the targets of hate in online social media. In: Proc. of the Intl. AAAI Conf. on Web and Social Media. vol. 10 (2016)
Agenda
Basic set of NLP features
Gitari, Njagi Dennis, Zhang Zuping, Hanyurwimfura Damien, and Jun Long. "A lexicon-based approach for hate speech detection." International Journal of Multimedia and Ubiquitous Engineering 10, no. 4 (2015): 215-230.
Fortuna, P., Nunes, S.: A survey on automatic detection of hate speech in text. ACM Computing Surveys (CSUR)51(4), 1–30 (2018)
Burnap, P., Williams, M.L.: Us and them: identifying cyber hate on twitter across multiple protected characteristics. EPJ Data science5, 1–15 (2016)
Djuric, Nemanja, Jing Zhou, Robin Morris, Mihajlo Grbovic, Vladan Radosavljevic, and Narayan Bhamidipati. "Hate speech detection with comment embeddings." In Proceedings of the 24th international conference on world wide web, pp. 29-30. 2015.
Davidson, T., Warmsley, D., Macy, M., Weber, I.: Automated hate speech detection and the problem of offensive language. In: Proc. of the Intl. AAAI Conf. on Web and Social Media. vol. 11 (2017)
More features
Nobata, C., Tetreault, J., Thomas, A., Mehdad, Y., Chang, Y.: Abusive language detection in online user content. In: Proc. of the 25th Intl. Conf. on world wide web. pp. 145–153 (2016)
Classifiers/Regressors
Bhardwaj, M., Akhtar, M.S., Ekbal, A.,Das, Amitava, Chakraborty, Tanmoy: Hostility detection dataset in hindi. arXiv preprint arXiv:2011.03588 (2020)
MacAvaney, S., Yao, H.R., Yang, E., Russell, K., Goharian, N., Frieder, O.: Hate speech detection: Challenges and solutions. PloS one14(8), e0221152 (2019)
Agenda
Basic architectures
[Suvarna et al. 2020]
Skipped CNNs
Zhang, Z., Luo, L.: Hate speech detection: A solved problem? the challenging case of long tail on twitter. Semantic Web10(5), 925–945 (2019)
Leveraging metadata
Founta, A.M., Chatzakou, D., Kourtellis, N., Blackburn, J., Vakali, A., Leontiadis, I.: A unified deep learning architecture for abuse detection. In: Proc. of the 10th ACM Conf. on web science. pp. 105–114 (2019)
The individual classifiers that are the basis of the combined model. Left: the text-only classifier, right is the metadata-only classifier.
Leveraging metadata
Founta, A.M., Chatzakou, D., Kourtellis, N., Blackburn, J., Vakali, A., Leontiadis, I.: A unified deep learning architecture for abuse detection. In: Proc. of the 10th ACM Conf. on web science. pp. 105–114 (2019)
Data Augmentation
Juuti, M., Grondahl, T., Flanagan, A., Asokan, N.: A little goes a long way: Improving toxic language classification despite data scarcity. In: Proc. of the 2020 Conf. on Empirical Methods in Natural Language Processing: Findings. pp. 2991–3009 (2020)
Tackling character-level adversarial attack
words are a kind of adversarial attacks
commonly adopted as a tool in manipulators’ arsenal to evade detection.
Mou, G., Ye, P., Lee, K.: Swe2: Subword enriched and significant word emphasized frame-work for hate speech detection. In: Proc. of the 29th ACM Intl. Conf. on Information & Knowledge Management. pp. 1145–1154 (2020)
Tackling character-level adversarial attack
Mou, G., Ye, P., Lee, K.: Swe2: Subword enriched and significant word emphasized frame-work for hate speech detection. In: Proc. of the 29th ACM Intl. Conf. on Information & Knowledge Management. pp. 1145–1154 (2020)
Performance of our SWE2 models and baselines without the adversarial attack
Accuracy of our SWE2 model and the best baseline under the adversarial attack
Multi-label classification
Parikh, P., Abburi, H.,Badjatiya, Pinkesh, Krishnan, R., Chhaya, N.,Gupta, M.,Varma, Vasudeva: Multi-label categorization of accounts of sexism using a neural framework. In: Proc. of the 2019 Conf. on Empirical Methods in Natural Language Processing andthe 9th Intl. Joint Conf. on Natural Language Processing (EMNLP-IJCNLP).pp. 1642–1652 (2019)
Multi-label classification
Parikh, P., Abburi, H.,Badjatiya, Pinkesh, Krishnan, R., Chhaya, N.,Gupta, M.,Varma, Vasudeva: Multi-label categorization of accounts of sexism using a neural framework. In: Proc. of the 2019 Conf. on Empirical Methods in Natural Language Processing andthe 9th Intl. Joint Conf. on Natural Language Processing (EMNLP-IJCNLP).pp. 1642–1652 (2019)
Multi-label classification
Parikh, P., Abburi, H.,Badjatiya, Pinkesh, Krishnan, R., Chhaya, N.,Gupta, M.,Varma, Vasudeva: Multi-label categorization of accounts of sexism using a neural framework. In: Proc. of the 2019 Conf. on Empirical Methods in Natural Language Processing andthe 9th Intl. Joint Conf. on Natural Language Processing (EMNLP-IJCNLP).pp. 1642–1652 (2019)
Agenda
Cyberbullying on the Instagram Social Network
Zhong, H., Li, H., Squicciarini, A.C., Rajtmajer, S.M., Griffin, C., Miller, D.J., Caragea, C.:Content-driven detection of cyberbullying on the instagram social network. In: IJCAI. vol. 16,pp. 3952–3958 (2016)
Classification results using SVM with an RBF kernel, given various (concatenated) feature sets. BoW=Bag of Words; OFF=Offensiveness score; Captions=LDA-generated topics from image captions;
CNN-Cl=Clusters generated from outputs of a pre-trained CNN over images
Unsupervised cyberbullying detection
Cheng, L., Shu, K., Wu, S., Silva, Y.N., Hall, D.L., Liu, H.: Unsupervised cyberbullying detection via time-informed gaussian mixture model. In: Proc. of the 29th ACM Intl. Conf. on Information & Knowledge Management. pp. 185–194 (2020)
Unsupervised cyberbullying detection
AUROC, and competitive Precision compared to the unsupervised baselines for both datasets.
Cheng, L., Shu, K., Wu, S., Silva, Y.N., Hall, D.L., Liu, H.: Unsupervised cyberbullying detection via time-informed gaussian mixture model. In: Proc. of the 29th ACM Intl. Conf. on Information & Knowledge Management. pp. 185–194 (2020)
Multimodal Twitter: MMHS150K
for the hate speech detection task, current
multimodal models cannot outperform models analyzing only text.
Detection module. 1-layer 150D LSTM using 100D GloVe.
Gomez, R., Gibert, J., Gomez, L., Karatzas, D.: Exploring hate speech detection in multi-modal publications. WACV. pp. 1470–1478 (2020)
Multimodal Twitter: MMHS150K
Gomez, R., Gibert, J., Gomez, L., Karatzas, D.: Exploring hate speech detection in multi-modal publications. WACV. pp. 1470–1478 (2020)
Hateful Memes Challenge
Kiela, D., Firooz, H., Mohan, A., Goswami, V., Singh, A., Ringshia, P., Testuggine, D.: The hateful memes challenge: Detecting hate speech in multimodal memes. Advances in Neural Information Processing Systems33(2020)
Hateful Memes Challenge
multimodal bi-transformers using Image-Grid/Image-Region
unimodally pretrained or pretrained on multimodal data
Kiela, D., Firooz, H., Mohan, A., Goswami, V., Singh, A., Ringshia, P., Testuggine, D.: The hateful memes challenge: Detecting hate speech in multimodal memes. Advances in Neural Information Processing Systems33(2020)
Multi-modal hate speech detection
Das, A., Wahi, J.S., Li, S.: Detecting hate speech in multi-modal memes. arXiv preprint arXiv:2012.14891 (2020)
Fine tune Visual Bert and BERT on Facebook hateful dataset and the captions generated on images of the Facebook hateful dataset.
RoBERTa for text encoding. VGG for visual sentiments.
Agenda
Challenges
Fortuna, P., Nunes, S.: A survey on automatic detection of hate speech in text. ACM Computing Surveys (CSUR)51(4), 1–30 (2018)
Nobata, C., Tetreault, J., Thomas, A., Mehdad, Y., Chang, Y.: Abusive language detection in online user content. In: Proc. of the 25th Intl. Conf. on world wide web. pp. 145–153 (2016)
Limitations of existing methods
MacAvaney, S., Yao, H.R., Yang, E., Russell, K., Goharian, N., Frieder, O.: Hate speech detection: Challenges and solutions. PloS one14(8), e0221152 (2019)
Thanks Q&A
SLOT-II
Agenda
Some Interesting observations
Table 1:
Table 2:
Table 3:
[1]: Hate is the New Infodemic: A Topic-aware Modeling of Hate Speech Diffusion on Twitter: https://arxiv.org/pdf/2010.04377.pdf
[2]: Would Your Tweet Invoke Hate on the Fly? Forecasting Hate Intensity of Reply Threads on Twitter: https://dl.acm.org/doi/10.1145/3447548.3467150
[3]: Better Prevent than React: Deep Stratified Learning to Predict Hate Intensity of Twitter Reply Chains: https://ieeexplore.ieee.org/document/9679052
Revisiting Metadata and Network Context
A Unified Deep Learning Architecture for Abuse Detection: https://arxiv.org/abs/1802.00385
Inter and Intra user history context
Leveraging Intra-User and Inter-User Representation Learning for Automated Hate Speech Detection: https://aclanthology.org/N18-2019.pdf
Network Characteristics of Hateful Users
Characterizing and Detecting Hateful Users on Twitter: https://arxiv.org/pdf/1803.08977.pdf
Network Characteristics of Hateful Users
Characterizing and Detecting Hateful Users on Twitter: https://arxiv.org/pdf/1803.08977.pdf
Diffusion and User Modeling of Hateful Text
Spread of hate speech in online social media: https://arxiv.org/abs/1812.01693
Diffusion and User Modeling of Hateful Text
Spread of hate speech in online social media: https://arxiv.org/abs/1812.01693
KH: Known hateful users and NH: Non hateful users
Additional Studies
Limitations of Existing Diffusion Analysis
Hate Diffusion on Tweet Retweets
Hate is the New Infodemic: A Topic-aware Modeling of Hate Speech Diffusion on Twitter: https://arxiv.org/pdf/2010.04377.pdf
Hate Diffusion on Tweet Retweets
Hate is the New Infodemic: A Topic-aware Modeling of Hate Speech Diffusion on Twitter: https://arxiv.org/pdf/2010.04377.pdf)
Hate Diffusion on Tweet Retweets: RETINA model
a) Exogenous attention
b) Static Retweet prediction Model
c) Dynamic Retweet Prediction Model
Hate is the New Infodemic: A Topic-aware Modeling of Hate Speech Diffusion on Twitter: https://arxiv.org/pdf/2010.04377.pdf
Hate Diffusion on Tweet Retweets: RETINA model
Signify models without exogenous influence
Hate is the New Infodemic: A Topic-aware Modeling of Hate Speech Diffusion on Twitter: https://arxiv.org/pdf/2010.04377.pdf
Fig1
Fig2
Hate Diffusion on Tweet Replies
Would Your Tweet Invoke Hate on the Fly? Forecasting Hate Intensity of Reply Threads on Twitter: https://dl.acm.org/doi/10.1145/3447548.3467150
Hate Diffusion on Tweet Replies: DESSRt Model
Would Your Tweet Invoke Hate on the Fly? Forecasting Hate Intensity of Reply Threads on Twitter: https://dl.acm.org/doi/10.1145/3447548.3467150
Hate Diffusion on Tweet Replies: DESSRt Model
Model shows consistent performance irrespective of the type of source user and source tweet.
Would Your Tweet Invoke Hate on the Fly? Forecasting Hate Intensity of Reply Threads on Twitter: https://dl.acm.org/doi/10.1145/3447548.3467150
Fig: 1
Fig: 2
Hate Diffusion on Tweet Replies: DRAGNET model
Better Prevent than React: Deep Stratified Learning to Predict Hate Intensity of Twitter Reply Chains: https://ieeexplore.ieee.org/document/9679052
Hate Diffusion on Tweet Replies: DRAGNET model
Better Prevent than React: Deep Stratified Learning to Predict Hate Intensity of Twitter Reply Chains: https://ieeexplore.ieee.org/document/9679052
Hate Diffusion on Tweet Replies: DRAGNET model
Better Prevent than React: Deep Stratified Learning to Predict Hate Intensity of Twitter Reply Chains: https://ieeexplore.ieee.org/document/9679052
Real-World Deployments of Hate Diffusion Models
Limitations and Future Scope
capture dynamic/ever-changing forms of hate.
increase in hateful behaviour online/offline. Capturing such inter-domain knowledge can help in early detection of hateful content.
Thanks Q&A
Break (5 mins)
SLOT-III
Psychological Analysis of Online Hate Spreaders
Amitava Das
Agenda
Intervention Strategies for Online Hate
Sarah Masud
Agenda
Countering Hateful Content on Social Media
Data Collection Strategy for Counter Narration
Generating Counter Narratives against Online Hate Speech: Data and Strategies: https://arxiv.org/pdf/2004.04216.pdf
Table 1: Characteristics of collection methods
Analyzing the hate and counter speech accounts on Twitter
Analyzing the hate and counter speech accounts on Twitter: https://arxiv.org/pdf/1812.02712.pdf
Analyzing the hate and counter speech accounts on Twitter
Table 1
Table 2
Analyzing the hate and counter speech accounts on Twitter: https://arxiv.org/pdf/1812.02712.pdf
CONAN: Multilingual Parallel Counter Dataset
(NICHE Sourcing)
CONAN - COunter NArratives through Nichesourcing: a Multilingual Dataset of Responses to Fight Online Hate Speech: https://arxiv.org/pdf/1910.03270.pdf
CONAN: Multilingual Parallel Counter Dataset
(NICHE Sourcing)
Fine-grained Hate Class
Fine-grained Counter-Hate Class
CONAN - COunter NArratives through Nichesourcing: a Multilingual Dataset of Responses to Fight Online Hate Speech: https://arxiv.org/pdf/1910.03270.pdf
Pointers adopted by NGO for counter-narrative generation
Author-Reviewer Architecture
Generating Counter Narratives against Online Hate Speech: Data and Strategies: https://arxiv.org/pdf/2004.04216.pdf :
Author-Reviewer Architecture
Authoring via machine generated counter text
Reviewing via machine classification of HS-CN pairs
Manual Validation
END
START
Generating Counter Narratives against Online Hate Speech: Data and Strategies: https://arxiv.org/pdf/2004.04216.pdf :
Empathy based Counter Speech
Empathy-based counterspeech can reduce racist hate speech in a social media field experiment: https://www.pnas.org/content/118/50/e2116310118
Empathy based Counter Speech
Proactive Strategies
Offensive to Non-Offensive Unsupervised Style Transfer
Si and Sj represent the two styles: offensive and non-offensive. Unsupervised method, uses non-labeled/parallel corpus.
Fighting Offensive Language on Social Media with Unsupervised Text Style Transfer: https://arxiv.org/pdf/1805.07685.pdf
Reconsidering Tweets: Intervening During Tweet Creation
Reconsidering Tweets: Intervening During Tweet Creation Decreases Offensive Content https://arxiv.org/abs/2112.00773
Reconsidering Tweets: Intervening During Tweet Creation
Reconsidering Tweets: Intervening During Tweet Creation Decreases Offensive Content https://arxiv.org/abs/2112.00773
Thanks Q&A
SLOT-IV
Agenda
Analysis of Bias in Hate Speech Detection
Pinkesh Badjatiya
Agenda
Definition
Why does a model learn these biases?
How to Learn an unbiased model from biased conversations ?
Impact of biased predictions
Examples of Incorrect predictions from Google’s Perspective API
(as on 15th Aug 2018)
Examples | Predicted Hate Label (Score) |
Those guys are nerds | Hateful (0.83) |
Can you throw that garbage please | Hateful (0.74) |
People will die if they kill Obamacare | Hateful (0.78) |
Oh shit. I did that mistake again | Hateful (0.91) |
that arab killed the plants | Hateful (0.87) |
I support gay marriage. I believe they have a might to be as miserable as the rest of us. | Hateful (0.77) |
Mitigating Bias in Learning
Goal:
✔ Model is fair towards all the ethnic groups, minorities and gender
✔ Bias from social media is not learnt
Choices for Bias Mitigation
Statistical Correction: Includes techniques that attempt to uniformly distribute the samples of every kind in all the target classes, altering the train set with samples to balance the term usage across the classes.
Example: Strategic Sampling, Data Augmentation
Ex. This is a hateful sentence for muslim
Ex. This is a hateful sentence for muslim
Ex. This is NOT a hateful sentence for muslim
🡪 +ve
🡪 -ve
Limitations: Not always possible to create balanced samples for all the keywords
Choices for Bias Mitigation
Statistical Correction:
Example: Adversarial Filters of Dataset Biases (Bras et al. (2020), ICML 2020)
An iterative greedy algorithm that can adversarially filter the biases from the training dataset
De-biased Version of Dataset
Choices for Bias Mitigation
Model Correction: Make changes to the model like modifying word embeddings or debiasing during model training
Example: Ensemble Learning
Model 2
Model 1
Model 3
Ensemble of black-box Models
Black-box models
Choices for Bias Mitigation
Model Correction: Make changes to the model like modifying word embeddings or debiasing during model training
Example: Adversarial Learning (Xia et al. (2020))
Limitations: Need labels for all the private attributes that we want to correct
Model
Input Sentence
Private Attributes
Ex. Gender
GRL
Model learns to identify hatespeech and gender
but NOT the gender
Gradient Reversal Layer
Hateful ?
Choices for Bias Mitigation
Model Correction:
Example: Statistical Model re-weighing (Utama et al. (2020))
An input example that contains lexical-overlap bias is predicted as entailment by the teacher model with a high confidence. When biased model predicts this example well, the output distribution of the teacher will be re-scaled to indicate higher uncertainty (lower confidence). The re-scaled output distributions are then used to distill the main model
Choices for Bias Mitigation
Data Correction: Focuses on converting the samples to a simpler form by reducing the amount of information available to the classifier during learning-stage.
Example: Private-attribute masking, Knowledge generalization (Badjatiya et al., 2019)
Ex. This is a hateful sentence for muslim
Ex. This is a hateful sentence for ########
🡪 Can we do better?
Choices for Bias Mitigation
Knowledge-based Generalizations
WordNet Hierarchy
Challenges and Limitations
Current Trends: Hate Speech Keeping Up with NLP
Preslav Nakov
Multi-Class Datasets
179
SemEval-2019 Task 6: Identifying and Categorizing Offensive Language in Social Media (OffensEval).
Marcos Zampieri, Shervin Malmasi, Preslav Nakov, Sara Rosenthal, Noura Farra, Ritesh Kumar:
The OLID Hierarchy
180
180
NAACL-HLT'2019: Predicting the Type and Target of Offensive Posts in Social Media.
Marcos Zampieri, Shervin Malmasi, Preslav Nakov, Sara Rosenthal, Noura Farra, Ritesh Kumar
The OLID Hierarchy
181
181
NAACL-HLT'2019: Predicting the Type and Target of Offensive Posts in Social Media.
Marcos Zampieri, Shervin Malmasi, Preslav Nakov, Sara Rosenthal, Noura Farra, Ritesh Kumar
The OLID Hierarchy
182
182
NAACL-HLT'2019: Predicting the Type and Target of Offensive Posts in Social Media.
Marcos Zampieri, Shervin Malmasi, Preslav Nakov, Sara Rosenthal, Noura Farra, Ritesh Kumar
The OLID Hierarchy
183
183
NAACL-HLT'2019: Predicting the Type and Target of Offensive Posts in Social Media.
Marcos Zampieri, Shervin Malmasi, Preslav Nakov, Sara Rosenthal, Noura Farra, Ritesh Kumar
Fine-Grained Hate Speech: OLID Dataset
--------- Level A (Content Type): Offensive, Non-Offensive
--------- --------- Level B (Offense Type): Targeted, Untargeted
--------- --------- --------- Level C (Target Type): Individual, Group, Others
https://aclanthology.org/N19-1144/
NAACL-HLT'2019: Predicting the Type and Target of Offensive Posts in Social Media.
Marcos Zampieri, Shervin Malmasi, Preslav Nakov, Sara Rosenthal, Noura Farra, Ritesh Kumar
Fine-Grained Hate Speech: OLID Dataset
Level A
https://aclanthology.org/N19-1144/
NAACL-HLT'2019: Predicting the Type and Target of Offensive Posts in Social Media.
Marcos Zampieri, Shervin Malmasi, Preslav Nakov, Sara Rosenthal, Noura Farra, Ritesh Kumar
Fine-Grained Hate Speech: OLID Dataset
Level C
https://aclanthology.org/N19-1144/
Level B
NAACL-HLT'2019: Predicting the Type and Target of Offensive Posts in Social Media.
Marcos Zampieri, Shervin Malmasi, Preslav Nakov, Sara Rosenthal, Noura Farra, Ritesh Kumar
SOLID: Semi-Supervised Extension
Used heterogeneous machine learning models that have diverse inductive biases:
Then, applied democratic co-training to generate semi-supervised labels using OLID as a seed dataset and distant supervision using the ensemble of models above.
187
187
ACL-2021 (Findings): SOLID: A Large-Scale Semi-Supervised Dataset for Offensive Language Identification.
Sara Rosenthal, Pepa Atanasova, Georgi Karadzhov, Marcos Zampieri, Preslav Nakov
Data Statistics
188
ACL-2021 (Findings): SOLID: A Large-Scale Semi-Supervised Dataset for Offensive Language Identification.
Sara Rosenthal, Pepa Atanasova, Georgi Karadzhov, Marcos Zampieri, Preslav Nakov
188
The Impact of Adding SOLID
189
ACL-2021 (Findings): SOLID: A Large-Scale Semi-Supervised Dataset for Offensive Language Identification.
Sara Rosenthal, Pepa Atanasova, Georgi Karadzhov, Marcos Zampieri, Preslav Nakov
189
190
SemEval-2020 Task 12: Multilingual Offensive Language Identification in Social Media (OffensEval 2020).
Marcos Zampieri, Preslav Nakov, Sara Rosenthal, Pepa Atanasova, Georgi Karadzhov, Hamdy Mubarak, Leon Derczynski, Zeses Pitenis, Çagri Çöltekin
: Multilingual Offensive Language Identification in Social Media
OffensEval 2020: Level A (all languages)
191
SemEval-2020 Task 12: Multilingual Offensive Language Identification in Social Media (OffensEval 2020).
Marcos Zampieri, Preslav Nakov, Sara Rosenthal, Pepa Atanasova, Georgi Karadzhov, Hamdy Mubarak, Leon Derczynski, Zeses Pitenis, Çagri Çöltekin
Neighborhood-Based Content Flagging
192
TACL (2022): A Neighbourhood Framework for Resource-Lean Content Flagging.
Sheikh Muhammad Sarwar, Dimitrina Zlatkova, Momchil Hardalov, Yoan Dinkov, Isabelle Augenstein, Preslav Nakov
Neighborhood-Based Content Flagging
193
Neighborhood Representation
Flagged/Non-Flagged
| Abusive | Benign |
Abusive | Entail | Contradict |
Benign | Contradict | Entail |
Query
Neighbor
TACL (2022): A Neighbourhood Framework for Resource-Lean Content Flagging.
Sheikh Muhammad Sarwar, Dimitrina Zlatkova, Momchil Hardalov, Yoan Dinkov, Isabelle Augenstein, Preslav Nakov
Neighborhood-Based Content Flagging
194
TACL (2022): A Neighbourhood Framework for Resource-Lean Content Flagging.
Sheikh Muhammad Sarwar, Dimitrina Zlatkova, Momchil Hardalov, Yoan Dinkov, Isabelle Augenstein, Preslav Nakov
HateBERT
WOAH@ACL-2021: HateBERT: Retraining BERT for Abusive Language Detection in English
Tommaso Caselli, Valerio Basile, Jelena Mitrović, Michael Granitzer
Hate Speech Detection Using GTP-3 Prompts
Zero-Shot
https://beta.openai.com/playground/p/BjTry9NqZqLebA nYnRmnuD57?model=davinci
One-shot
https://beta.openai.com/playground/p/QcqZSdfFPCei0ae 5ePJkK1va?model=davinci
Few-shot
https://beta.openai.com/playground/p/4Qsizf82t07oMVJZiZrg9KX M?model=davinci
Hate Speech Detection via GTP-3 Prompts: https://arxiv.org/pdf/2103.12407.pdf
HATECHECK: Functional Tests for Hate Speech
197
ACL’2021: HateCheck: Functional Tests for Hate Speech Detection Models.
Paul Röttger, Bertram Vidgen, Dong Nguyen, Zeerak Waseem, Helen Z. Margetts, Janet B. Pierrehumbert
HATECHECK: Functional Tests for Hate Speech
198
ACL’2021: HateCheck: Functional Tests for Hate Speech Detection Models.
Paul Röttger, Bertram Vidgen, Dong Nguyen, Zeerak Waseem, Helen Z. Margetts, Janet B. Pierrehumbert
199
Beyond Hate Speech:
Detecting Harmful Memes and Their Targets
ACL'2021 (Findings): Detecting Harmful Memes and Their Targets.
Shraman Pramanick, Dimitar Dimitrov, Rituparna Mukherjee, Shivam Sharma, Md. Shad Akhtar, Preslav Nakov, Tanmoy Chakraborty
199
Detecting Harmful Memes
and Their Targets
200
Problem 1 (Harmful meme detection):
Problem 2 (Target identification of harmful memes):
HarMeme: 3,544 memes related to COVID-19.
Beyond Hate Speech:
Detecting Harmful Memes and Their Targets
ACL'2021 (Findings): Detecting Harmful Memes and Their Targets.
Shraman Pramanick, Dimitar Dimitrov, Rituparna Mukherjee, Shivam Sharma, Md. Shad Akhtar, Preslav Nakov, Tanmoy Chakraborty
200
Beyond Hate Speech:
Detecting Harmful Memes and Their Targets
EMNLP'2021 (Findings): MOMENTA: A Multimodal Framework for Detecting Harmful Memes and Their Targets
Shraman Pramanick, Shivam Sharma, Dimitar Dimitrov, Md. Shad Akhtar, Preslav Nakov, Tanmoy Chakraborty
201
Beyond Hate Speech: Propaganda
Institute for Propaganda Analysis
202
IJCAI-2020: A Survey on Computational Propaganda Detection.
Giovanni Da San Martino, Stefano Cresci, Alberto Barrón-Cedeño, Seunghak Yu, Roberto Di Pietro, Preslav Nakov
Beyond Hate Speech:
Fine-Grained Propaganda Detection
Dataset
203
EMNLP-2019: Fine-Grained Analysis of Propaganda in News Articles.
Giovanni Da San Martino, Seunghak Yu, Alberto Barrón-Cedeño, Rostislav Petrov, Preslav Nakov
Beyond Hate Speech:
Fine-Grained Propaganda Detection
204
EMNLP-2019: Fine-Grained Analysis of Propaganda in News Articles.
Giovanni Da San Martino, Seunghak Yu, Alberto Barrón-Cedeño, Rostislav Petrov, Preslav Nakov
205
ACL-2020 (best demo award, honorable mention): Prta: A System to Support the Analysis of Propaganda Techniques in the News.
Giovanni Da San Martino, Shaden Shaar, Yifan Zhang, Seunghak Yu, Alberto Barrón-Cedeño, Preslav Nakov
206
Beyond Hate Speech:
Shared Tasks on Propaganda Techniques
Techniques in text:
Techniques in text+image:
{
"id": "125",
"labels": [
"Loaded Language",
"Name calling/Labeling"
],
"text": "I HATE TRUMP\n\nMOST TERRORIST DO"
}
{
"id": "125",
"labels": [
"Reductio ad hitlerum",
"Smears",
"Loaded Language",
"Name calling/Labeling"
],
"text": "I HATE TRUMP\n\nMOST TERRORIST DO",
"image": "125_image.png"
}
Beyond Hate Speech:
SemEval-2021: Propaganda Techniques in Memes
207
SemEval-2021: SemEval-2021 Task 6: Detection of Persuasive Techniques in Texts and Images
Dimitar Dimitrov, Bishr Bin Ali, Shaden Shaar, Firoj Alam, Fabrizio Silvestri, Hamed Firooz, Preslav Nakov and Giovanni Da San Martino
Beyond Hate Speech:
Propaganda Detection in Memes
208
Appeal to Fear; Black & White Fallacy
Whataboutism
ACL-2021 (Findings): Detecting Propaganda Techniques in Memes.
Dimitar Dimitrov, Bishr Bin Ali, Shaden Shaar, Firoj Alam, Fabrizio Silvestri, Hamed Firooz, Preslav Nakov and Giovanni Da San Martino
Beyond Hate Speech: Policies of Big Tech
209
ArXiv 2021: Detecting Abusive Language on Online Platforms: A Critical Analysis.
Preslav Nakov, Vibha Nayak, Kyle Dent, Ameya Bhatawdekar, Sheikh Muhammad Sarwar, Momchil Hardalov, Yoan Dinkov, Dimitrina Zlatkova, Guillaume Bouchard, Isabelle Augenstein
Beyond Hate Speech:
Policies of Product-Specific Platforms
210
210
ArXiv 2021: Detecting Abusive Language on Online Platforms: A Critical Analysis.
Preslav Nakov, Vibha Nayak, Kyle Dent, Ameya Bhatawdekar, Sheikh Muhammad Sarwar, Momchil Hardalov, Yoan Dinkov, Dimitrina Zlatkova, Guillaume Bouchard, Isabelle Augenstein
Zero-Shot Classification
Cross-lingual Zero- and Few-shot Hate Speech Detection utilising frozen Transformer Language Models and AXEL: https://arxiv.org/pdf/2004.13850.pdf
Zero-Shot Classification via BERT
Using Transfer-based Language Models to Detect Hateful and Offensive Language Online: https://aclanthology.org/2020.alw-1.3/
HateBERT: Retraining BERT for Abusive Language Detection in English
Fine-tuned results comparison
Fine-tuned results comparison (cross- dataset training and testing)
HateBERT: Retraining BERT for Abusive Language Detection in English: https://arxiv.org/abs/2010.12472
Hate Speech Detection via GTP-3 Prompts
Hate Speech Detection via GTP-3 Prompts: https://arxiv.org/pdf/2103.12407.pdf
Cross lingual Hate Speech Detection
Exposing the limits of Zero-shot Cross-lingual Hate Speech Detection:https://aclanthology.org/2021.acl-short.114/
Cross lingual Hate Speech Detection
Exposing the limits of Zero-shot Cross-lingual Hate Speech Detection:https://aclanthology.org/2021.acl-short.114/
Limitations
Exposing the limits of Zero-shot Cross-lingual Hate Speech Detection:https://aclanthology.org/2021.acl-short.114/
Concluding Remarks
Key Takeaways
Future Scope
[1] Fig 1:
Thanks Q&A