1 of 221

Tutorial on

Combating Online Hate Speech:

The Role of Content, Networks, Psychology, User Behavior, etc.

2 of 221

Our Team

Sarah Masud IIIT-D, India

Pinkesh Badjatiya Microsoft, India

Amitava Das Wipro, India

Manish Gupta Microsoft, India

Tanmoy Chakraborty

IIIT-D, India

Preslav Nakov

QCRI, Qatar

3 of 221

Available at: https://hatewash.github.io/#outline

Tutorial Outline

  • Slot I: (60 mins)
    • Introduction: 20 mins (Tanmoy)
    • Hate Speech Detection: 30 mins (Manish)
    • Questions: (10 mins)
  • Slot II: (50 mins)
    • Hate Speech Diffusion: 40 mins (Sarah)
    • Questions: (10 mins)
  • Break (5 mins)
  • Slot III: (60 mins)
    • Psychological Analysis of Hate Spreaders: 25 mins (Amitava)
    • Intervention Measures for Hate Speech: 25 mins (Sarah)
    • Questions: (10 mins)
  • Slot IV: (65 mins)
    • Overview of Bias in Hate Speech: 25 mins (Pinkesh)
    • Current Developments: 25 mins (Preslav)
    • Future Scope & Concluding Remarks: 5 mins (Tanmoy/Sarah)
    • Questions: (10 mins)

4 of 221

Why Study Hate Speech?

5 of 221

Various Forms of Malicious Online Content

CyberBullying

Abuse

Profanity

Offense

Aggression

Provocation

Toxicity

Spam

FakeNews

Rumours

HateSpeech

Trolling

Personal Attacks

  • Our online experiences are clouded by presence of malicious content.
  • Anonymity has lead to increase in anti-social behaviour [1], hate speech being one of them.
  • They can be studied at a macroscopic as well as microscopic level.
  • Xenophobia
  • Racism
  • Sexism
  • islamophobia
  • Such malcontent is available in all media formats
  • Text
  • Speech
  • Images, Memes, Audio-video
  • Email, DMs, Comments, Replies.

Fraud

[1] https://pubmed.ncbi.nlm.nih.gov/15257832/

6 of 221

Statistics of Hate Speech Prevalence

1134 Americans surveyed from Dec 17, 2018 to Dec 27, 2018

Anti-Defamation League https://www.adl.org/onlineharassment

Percentage of U.S. Adults Who Have Experienced Harassment Online

Reasons for Online Hate

Percentage of Respondents Who Were Targeted Because of Their Membership in a Protected Class

7 of 221

Hate speech on Internet is an age old problem

Fig 1: https://en.wikipedia.org/wiki/Controversial_Reddit_communities

Fig 2: https://www.youtube.com/watch?v=1ndq79y1ar4

Fig 3:

https://theconversation.com/hate-speech-is-still-easy-to-find-on-social-media-1060 20

Fig 4: https://twitter.com/AdhirajGabbar/status/1348145356282884097

Fig : List of Extremist/Controversial SubReddits

Fig4: Twitter Offensive Speech

Fig3: Twitter hate Speech

Fig 2: Youtube Video Incident to Violence and Hate Crime

8 of 221

Internet’s policy w.r.t curbing Hate

Some famous platforms with stricter

policies:

  1. Twitter
  2. Facebook
  3. Instagram
  4. Youtube
  5. Reddit

Flag Bearer of Free Speech (as a home for hate speech): Unmoderated platforms

  1. Gab
  2. 4chan
  3. BitChute
  4. Parler
  5. StormFront
  • Unmoderated content on platforms like Gab contains more negative sentiment and higher toxicity compared to moderated content on platforms like Twitter. [2]
  • Banning users is not as effective as it appears: Users regroup on other platforms, or find backdoor entries into the banned platform, spreading more aggressive content than before. [1]
  • Interestingly, hate speech against gender is a major hate theme across platforms [2]

[2]: Characterizing (Un)moderated Textual Data in Social Systems

9 of 221

Ill Effects of Hate Speech

  • Based on the entity being harmed:
    • Targeted individuals
    • Vulnerable groups
    • Society as a collective

  • Based on the actions:
    • Online abuse
    • Offline crimes
    • Online hate leading to offline hate crimes

10 of 221

Ill Effects of Hate Speech

1134 Americans surveyed from Dec 17, 2018 to Dec 27, 2018

Anti-Defamation League https://www.adl.org/onlineharassment

Harassment of Daily Users of Platforms

Impact of Online Hate and Harassment

Societal Impact of Online Hate and Harassment

11 of 221

Why is studying hate speech detection critical?

  • COVID-19 pandemic -> online world came closer than ever.
    • 70% increase in hate speech among teen and kids online
    • Toxicity levels in gaming community has increased by 40%
  • People are more likely to adopt an aggressive behavior because of the anonymity online.
  • Mandatory requirements set by government
  • Quality of service
    • Social media companies provide a service.
    • They profit from this service and, therefore, assume public obligations with respect to the contents transmitted.
    • Hence, they must discourage online hate and remove hate speech within a reasonable time.
  • Can lead to real world riots.
  • More than half of all hate-related terrestrial attacks following 9/11 occurred within two

weeks of the event. An automated cyber hate classification system could support more proactive public order management in the first two weeks following an event.

https://l1ght.com/Toxicity_during_coronavirus_Report-L1ght.pdf

Fortuna, P., Nunes, S.: A survey on automatic detection of hate speech in text. ACM Computing Surveys (CSUR)51(4), 1–30 (2018)

Burnap, P., Williams, M.L.: Us and them: identifying cyber hate on twitter across multiple protected characteristics. EPJ Data science5, 1–15 (2016)

12 of 221

Definition of hate speech

  • Post, content (language/image)
  • Targeting a specific group of people or a member of such group
  • Based on “protected characteristics” like race,

ethnicity, national origin, religious affiliation, sexual

orientation, sex, gender, descent, or serious disability or disease.

  • With malicious intentions of spreading hate, being

derogatory, encouraging violence, or aims to

dehumanize (comparing people to non-human things,

e.g. animals), insult, promote or justify hatred, discrimination or hostility.

  • It includes statements of inferiority, and calls for exclusion or segregation.

Badjatiya, Pinkesh, Gupta, S.,Gupta, Manish, Varma, Vasudeva: Deep learning for hate speech detection in tweets. In: Proceedings of the 26th international conference on World Wide Web companion. pp. 759–760 (2017) Bhardwaj, M., Akhtar, M.S., Ekbal, A.,Das, Amitava, Chakraborty, Tanmoy: Hostility detection dataset in hindi. arXiv preprint arXiv:2011.03588 (2020)

Davidson, T., Warmsley, D., Macy, M., Weber, I.: Automated hate speech detection and the problem of offensive language. In: Proc. of the Intl. AAAI Conf. on Web and Social Media. vol. 11 (2017) Fortuna, P., Nunes, S.: A survey on automatic detection of hate speech in text. ACM Computing Surveys (CSUR)51(4), 1–30 (2018)

Youtube, Facebook, Twitter

Kiela, D., Firooz, H., Mohan, A., Goswami, V., Singh, A., Ringshia, P., Testuggine, D.: The hateful memes challenge: Detecting hate speech in multimodal memes. Advances in Neural Information Processing Systems33(2020) MacAvaney, S., Yao, H.R., Yang, E., Russell, K., Goharian, N., Frieder, O.: Hate speech detection: Challenges and solutions. PloS one14(8), e0221152 (2019)

https://www.adl.org/sites/default/files/documents/pyramid-of-hate.pdf

13 of 221

Hate Speech Detection

Manish Gupta

gmanish@microsoft.com

21th Feb 2021

14 of 221

Agenda

  • Why is hate speech detection important?
  • Hate speech datasets
  • Feature based approaches
  • Deep learning methods
  • Multimodal hate speech detection
  • Challenges and limitations

15 of 221

Popular social network datasets

  • Twitter: English 16914 tweets, 3383 are labeled as sexist, 1972 as racist, 10640 as neutral. [Waseem et al. 2016]
  • Twitter: English [Wijesiriwardene et al. 2020] dataset of toxicity (harassment, offensive language, hate speech)
  • [Davidson et al. 2017]. 24802 tweets.
    • 5% hate speech, 76% offensive, remainder non-offensive
  • Hindi [Bhardwaj et al. 2020]
    • 8200 hostile and non-hostile texts from various social media platforms like Twitter, Facebook, WhatsApp, etc
    • Multi-label
    • four hostility dimensions: fake news (1638), hate speech (1132), offensive (1071), and defamation posts (810), along with a non-hostile label (4358).
  • English Gab. [Chandra et al. 2020]
    • 7601 posts. Anti-Semitism.
    • presence of abuse, severity (‘Biased Attitude, ‘Act of Bias and Discrimination’ and ‘Violence and Genocide’) and target of abusive behavior (individual 2nd/3rd person, group)

Waseem, Zeerak, and Dirk Hovy. "Hateful symbols or hateful people? predictive features for hate speech detection on twitter." In Proceedings of the NAACL student research workshop, pp. 88-93. 2016.

Bhardwaj, M., Akhtar, M.S., Ekbal, A.,Das, Amitava, Chakraborty, Tanmoy: Hostility detection dataset in hindi. arXiv preprint arXiv:2011.03588 (2020)

Wijesiriwardene, Thilini, Hale Inan, Ugur Kursuncu, Manas Gaur, Valerie L. Shalin, Krishnaprasad Thirunarayan, Amit Sheth, and I. Budak Arpinar. "Alone: A dataset for toxic behavior among adolescents on twitter." In International Conference on Social Informatics, pp. 427-439. Springer, Cham, 2020.

Chandra, M., Pathak, A., Dutta, E., Jain, P.,Gupta, Manish, Shrivastava, M., Kumaraguru,P.: Abuseanalyzer: Abuse detection, severity and target prediction for gab posts. In: Proc. of the 28th Intl. Conf. on Computational Linguistics. pp. 6277–6283 (2020) Davidson, T., Warmsley, D., Macy, M., Weber, I.: Automated hate speech detection and the problem of offensive language. In: Proc. of the Intl. AAAI Conf. on Web and Social Media. vol. 11 (2017)

16 of 221

Other popular datasets

  • Instagram [Homa et al. 2015]: 678 bully sessions out of 2218. 155260 comments.
  • Vine [Rahat et al. 2015]: 304 bully sessions from 970. 78250 comments.
  • Instagram [Zhong et al. 2020]. 3000 images. Cyberbullying. 560 bullied, 2540 not. 30 comments each taken from 1120 images are labeled with bully or not.
  • Multi-modal Hateful Memes Dataset [Kiela et al. 2020]
  • MMHS150K [Gomez et al. 2020]. Multi-modal. Twitter.
    • 150K from Sep 2018 to Feb 2019.
    • 112845 not-hate and 36978 hate tweets.
    • 11925 racist, 3495 sexist, 3870 homophobic, 163 religion-based hate and 5811 other hate tweets
  • Kaggle Toxic Comment Classification Challenge dataset: used by [Juuti et al. 2020]
    • human-labeled English Wikipedia comments in six different classes of toxic language: toxic, severe toxic, obscene, threat, insult, and identity-hate.
    • Of the threat documents in the full training dataset (GOLD STANDARD), 449/478 overlap with toxic. For identity-hate, overlap with toxic is 1302/1405.

Homa Hosseinmardi, Sabrina Arredondo Mattson, Rahat Ibn Rafiq, Richard Han, Qin Lv, and Shivakant Mishra. 2015. Analyzing labeled cyberbullying incidents on the instagram social network. In Socinfo. Springer, 49–66. Rahat Ibn Rafiq, Homa Hosseinmardi, Richard Han, Qin Lv, Shivakant Mishra, and Sabrina Arredondo Mattson. 2015. Careful what you share in six seconds: Detecting cyberbullying instances in Vine. In ASONAM. ACM, 617–622 Zhong, H., Li, H., Squicciarini, A.C., Rajtmajer, S.M., Griffin, C., Miller, D.J., Caragea, C.:Content-driven detection of cyberbullying on the instagram social network. In: IJCAI. vol. 16,pp. 3952–3958 (2016)

Kiela, D., Firooz, H., Mohan, A., Goswami, V., Singh, A., Ringshia, P., Testuggine, D.: The hateful memes challenge: Detecting hate speech in multimodal memes. Advances in Neural Information Processing Systems33(2020) Gomez, R., Gibert, J., Gomez, L., Karatzas, D.: Exploring hate speech detection in multi-modal publications. In: Proc. of the IEEE/CVF Winter Conf. on Applications of Computer Vision. pp. 1470–1478 (2020)

Juuti, M., Gr ̈ondahl, T., Flanagan, A., Asokan, N.: A little goes a long way: Improving toxic language classification despite data scarcity. In: Proc. of the 2020 Conf. on Empirical Methods in Natural Language Processing: Findings. pp. 2991–3009 (2020)

17 of 221

Other popular datasets

  • SafeCity [Karlekar et al. 2018]
    • Each of the 9,892 stories includes a description of the incident, the location, and tagged forms of harassment. 13 tags. Top three—groping/touching, staring/ogling, and commenting
  • Gab hate corpus (GHC): 27655
    • Train: 24,353 posts with 2,027 labeled as hate
    • Test: 1,586 posts with 372 labeled as hate
  • Stormfront web domain:
    • 7,896 (1,059 hate) training sentences, 979 (122) validation, and 1,998 (246) test.
  • Comments found on Yahoo! Finance and News [Nobata et al. 2016]
    • Finance: 53516 abusive and 705886 clean comments.
    • News: 228119 abusive and 1162655 clean comments.
  • Sexism sub-categorization [Parikh et al. 2019]
    • 13023 accounts of sexism from EveryDaySexism, multilabel, 23-class.
  • Whisper: June 2014-June 2015. [Silva et al. 2016]
    • 7604 hate whispers; used templates.
  • Hatebase – large black lists.

Karlekar, S., Bansal, M.: Safecity: Understanding diverse forms of sexual harassment personal stories. In: Proc. of the 2018 Conf. on Empirical Methods in Natural Language Processing. pp. 2805–2811 (2018)

Nobata, C., Tetreault, J., Thomas, A., Mehdad, Y., Chang, Y.: Abusive language detection in online user content. In: Proc. of the 25th Intl. Conf. on world wide web. pp. 145–153 (2016)

Parikh, P., Abburi, H.,Badjatiya, Pinkesh, Krishnan, R., Chhaya, N.,Gupta, M.,Varma, Vasudeva: Multi-label categorization of accounts of sexism using a neural framework. In: Proc. of the 2019 Conf. on Empirical Methods in Natural Language Processing andthe 9th Intl. Joint Conf. on Natural Language Processing (EMNLP-IJCNLP).pp. 1642–1652 (2019)

Silva, L., Mondal, M., Correa, D., Benevenuto, F., Weber, I.: Analyzing the targets of hate in online social media. In: Proc. of the Intl. AAAI Conf. on Web and Social Media. vol. 10 (2016)

18 of 221

Agenda

  • Why is hate speech detection important?
  • Hate speech datasets
  • Feature based approaches
  • Deep learning methods
  • Multimodal hate speech detection
  • Challenges and limitations

19 of 221

Basic set of NLP features

  • Dictionaries
    • Content words and ngrams (such as insults and swear words, reaction words, personal pronouns) collected from www.noswearing.com
    • Hate verb lists [Gitari et al. 2015]
    • Hateful terms and phrases for hate speech based on race, disability and sexual orientation from Wiki pages [Burnap et al. 2016]
    • Acronyms and abbreviations and variants (using edit distance) of profane words
  • Bag of words
  • Ngrams: word and character.
  • TF-IDF, Part-of-speech, NER, dependency parsing.
  • Embeddings: Distributional bag of words (para2vec) [Djuric et al. 2015]
  • Topic Classification, Sentiment
  • Frequencies of personal pronouns in the first and second person, the presence of emoticons, and capital letters
  • Flesch-Kincaid Grade Level and Flesch Reading Ease scores
  • Binary and count indicators for hashtags, mentions, retweets, and URLs, as well as features for the number of characters, words, and syllables in each tweet.

Gitari, Njagi Dennis, Zhang Zuping, Hanyurwimfura Damien, and Jun Long. "A lexicon-based approach for hate speech detection." International Journal of Multimedia and Ubiquitous Engineering 10, no. 4 (2015): 215-230.

Fortuna, P., Nunes, S.: A survey on automatic detection of hate speech in text. ACM Computing Surveys (CSUR)51(4), 1–30 (2018)

Burnap, P., Williams, M.L.: Us and them: identifying cyber hate on twitter across multiple protected characteristics. EPJ Data science5, 1–15 (2016)

Djuric, Nemanja, Jing Zhou, Robin Morris, Mihajlo Grbovic, Vladan Radosavljevic, and Narayan Bhamidipati. "Hate speech detection with comment embeddings." In Proceedings of the 24th international conference on world wide web, pp. 29-30. 2015.

Davidson, T., Warmsley, D., Macy, M., Weber, I.: Automated hate speech detection and the problem of offensive language. In: Proc. of the Intl. AAAI Conf. on Web and Social Media. vol. 11 (2017)

20 of 221

More features

  • Linguistic
    • Length of comment in tokens, average length of word, number of punctuations, number of periods, question marks, quotes, and repeated punctuation
    • Number of one letter tokens, number of capitalized letters, number of URLs, number of tokens with non-alpha characters in the middle, number of discourse connectives, number of politeness words, number of modal words (to measure hedging and confidence by speaker)
    • Number of unknown words as compared to a dictionary of English words (meant to measure uniqueness and any misspellings), number of insult and hate blacklist words
  • Syntactic
    • Parent of node, grandparent of node, POS of parent, POS of grandparent, tuple consisting of the word, parent and grandparent, children of node,
    • Tuples consisting of the permutations of the word or its POS, the dependency label connecting the word to its parent, and the parent or its POS

Nobata, C., Tetreault, J., Thomas, A., Mehdad, Y., Chang, Y.: Abusive language detection in online user content. In: Proc. of the 25th Intl. Conf. on world wide web. pp. 145–153 (2016)

21 of 221

Classifiers/Regressors

  • SVMs
  • Logistic regression
  • Random forests
  • MLPs
  • Naïve Bayes
  • Ensemble
  • Stacked SVMs (base SVMs each trained on different features and then an SVM meta-classifier on top) [MacAvaney et al. 2019]

Bhardwaj, M., Akhtar, M.S., Ekbal, A.,Das, Amitava, Chakraborty, Tanmoy: Hostility detection dataset in hindi. arXiv preprint arXiv:2011.03588 (2020)

MacAvaney, S., Yao, H.R., Yang, E., Russell, K., Goharian, N., Frieder, O.: Hate speech detection: Challenges and solutions. PloS one14(8), e0221152 (2019)

22 of 221

Agenda

  • Why is hate speech detection important?
  • Hate speech datasets
  • Feature based approaches
  • Deep learning methods
  • Multimodal hate speech detection
  • Challenges and limitations

23 of 221

Basic architectures

  • CNNs [Badjatiya et al. 2017]
  • LSTMs [Badjatiya et al. 2017]
  • FastText (avg word vectors) [Badjatiya et al. 2017]
    • CNN performed better than LSTM which was better than FastText [Badjatiya et al. 2017]
    • Best method is “LSTM + Random Embedding + GBDT”
  • MTL with Transformers [Chandra et al. 2020]
  • MTL with LSTMs [Suvarna et al. 2020]
  • Multi-label CNN+RNN [Karlekar et al. 2018]
  • Badjatiya, Pinkesh, Gupta, S.,Gupta, Manish, Varma, Vasudeva: Deep learning for hate speech detection in tweets. In: Proceedings of the 26th international conference on World Wide Web companion. pp. 759–760 (2017)
  • Chandra, M., Pathak, A., Dutta, E., Jain, P.,Gupta, Manish, Shrivastava, M., Kumaraguru,P.: Abuseanalyzer: Abuse detection, severity and target prediction for gab posts. In: Proc. of the 28th Intl. Conf. on Computational Linguistics. pp. 6277–6283 (2020)
  • Karlekar, S., Bansal, M.: Safecity: Understanding diverse forms of sexual harassment personal stories. In: Proc. of the 2018 Conf. on Empirical Methods in Natural Language Processing. pp. 2805–2811 (2018)
  • Suvarna, A., Bhalla, G.: # notawhore! a computational linguistic perspective of rape culture and victimization on social media. In: Proc. of the 58th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop. pp. 328–335 (2020)

[Suvarna et al. 2020]

24 of 221

Skipped CNNs

  • Use ‘gapped window’ to extract features from input
  • We expect it to extract useful features such as
    • ‘muslim refugees ? troublemakers’
    • ‘muslim ? ? troublemakers’,
    • ‘refugees ? troublemakers’
    • ‘they ? ? deported’
  • A similar concept of atrous (or ‘dilated’) convolution has been used in image processing

Zhang, Z., Luo, L.: Hate speech detection: A solved problem? the challenging case of long tail on twitter. Semantic Web10(5), 925–945 (2019)

25 of 221

Leveraging metadata

Founta, A.M., Chatzakou, D., Kourtellis, N., Blackburn, J., Vakali, A., Leontiadis, I.: A unified deep learning architecture for abuse detection. In: Proc. of the 10th ACM Conf. on web science. pp. 105–114 (2019)

The individual classifiers that are the basis of the combined model. Left: the text-only classifier, right is the metadata-only classifier.

26 of 221

Leveraging metadata

  • Combination
    • Concatenate the text and metadata networks at their penultimate layer.
    • Ways to train
      • Train entire network at once (Naïve)
      • Transfer learn pretrained weights for both the paths and freeze weights while finetuning.
      • Transfer learn with finetune.
      • Interleaved

Founta, A.M., Chatzakou, D., Kourtellis, N., Blackburn, J., Vakali, A., Leontiadis, I.: A unified deep learning architecture for abuse detection. In: Proc. of the 10th ACM Conf. on web science. pp. 105–114 (2019)

27 of 221

Data Augmentation

  • BERT performed the best, shallow classifiers performed comparably when trained on data augmented with a combination of three techniques, including GPT-2-generated sentences.
  • Methods
    • Simple oversampling: copying minority class datapoints to appear multiple times.
    • EDA (Wei and Zou, 2019): combines four text transformations (i) synonym replacement from WordNet, (ii) random insertion of a synonym, (iii) random swap of two words, (iv) random word deletion.
    • WordNet: Replacing words with random synonyms from WordNet by applying word sense disambiguation and inflection.
    • Paraphrase Database (PPDB): Replace equivalent phrases (controlled substitution by grammatical context)
      • In single words context is the POS tag; whereas in multi-word paraphrases it also contains the syntactic category that appears after the original phrase in the PPDB training corpus.
    • Embedding neighbour substitutions: Produce top-10 nearest embedding neighbours (cosine similarity) of each word selected for replacement, and randomly pick the new word from these.
      • Twitter word embeddings (GLOVE)
      • Subword embeddings (BPEMB): BPEMB (Heinzerling and Strube, 2018) provides pre-trained SentencePiece GloVe embeddings.
    • Majority class sentence addition (ADD)
      • Add a random sentence from a majority class document in SEED to a random position in a copy of each minority class training document.
    • GPT-2 conditional generation
      • 110M parameter GPT-2. Train GPT-2 on minority class documents in SEED. Generate N 1 novel documents for all minority class samples x in SEED. Assign the minority class label to all documents, and merge them with SEED.

Juuti, M., Grondahl, T., Flanagan, A., Asokan, N.: A little goes a long way: Improving toxic language classification despite data scarcity. In: Proc. of the 2020 Conf. on Empirical Methods in Natural Language Processing: Findings. pp. 2991–3009 (2020)

28 of 221

Tackling character-level adversarial attack

  • Intentionally or deliberately misspelled

words are a kind of adversarial attacks

commonly adopted as a tool in manipulators’ arsenal to evade detection.

    • ‘nigger’ 🡪 ‘n1gger’ or ‘nigga’
  • Solution: use both word-level and subword-level (phonetic and char) semantics.
  • Train Phonetic-Level Embedding while end-to-end training.
  • Most significant word recognition.

Mou, G., Ye, P., Lee, K.: Swe2: Subword enriched and significant word emphasized frame-work for hate speech detection. In: Proc. of the 29th ACM Intl. Conf. on Information & Knowledge Management. pp. 1145–1154 (2020)

29 of 221

Tackling character-level adversarial attack

  • Character-level and phonetic-level embeddings for the target word.
  • Word embedding (BERT/FastText) for before/after words.

Mou, G., Ye, P., Lee, K.: Swe2: Subword enriched and significant word emphasized frame-work for hate speech detection. In: Proc. of the 29th ACM Intl. Conf. on Information & Knowledge Management. pp. 1145–1154 (2020)

Performance of our SWE2 models and baselines without the adversarial attack

Accuracy of our SWE2 model and the best baseline under the adversarial attack

30 of 221

Multi-label classification

Parikh, P., Abburi, H.,Badjatiya, Pinkesh, Krishnan, R., Chhaya, N.,Gupta, M.,Varma, Vasudeva: Multi-label categorization of accounts of sexism using a neural framework. In: Proc. of the 2019 Conf. on Empirical Methods in Natural Language Processing andthe 9th Intl. Joint Conf. on Natural Language Processing (EMNLP-IJCNLP).pp. 1642–1652 (2019)

31 of 221

Multi-label classification

  • Word embeddings: GloVe, ELMo, fastText, linguistic features
  • Sentence embeddings: BERT, USE, InferSent.
  • Single-label Transformations
    • The Label Powerset (LP) method
      • treats each distinct combination of classes existing in the training set as a separate class.
      • The standard cross-entropy loss can then be used along with softmax.
    • Binary relevance (BR)
      • An independent binary classifier is trained to predict the applicability of each label in this method.
      • This entails training a total of L classifiers, making BR computationally very expensive.
      • Disregards correlations existing between labels.

Parikh, P., Abburi, H.,Badjatiya, Pinkesh, Krishnan, R., Chhaya, N.,Gupta, M.,Varma, Vasudeva: Multi-label categorization of accounts of sexism using a neural framework. In: Proc. of the 2019 Conf. on Empirical Methods in Natural Language Processing andthe 9th Intl. Joint Conf. on Natural Language Processing (EMNLP-IJCNLP).pp. 1642–1652 (2019)

32 of 221

Multi-label classification

Parikh, P., Abburi, H.,Badjatiya, Pinkesh, Krishnan, R., Chhaya, N.,Gupta, M.,Varma, Vasudeva: Multi-label categorization of accounts of sexism using a neural framework. In: Proc. of the 2019 Conf. on Empirical Methods in Natural Language Processing andthe 9th Intl. Joint Conf. on Natural Language Processing (EMNLP-IJCNLP).pp. 1642–1652 (2019)

33 of 221

Agenda

  • Why is hate speech detection important?
  • Hate speech datasets
  • Feature based approaches
  • Deep learning methods
  • Multimodal hate speech detection
  • Challenges and limitations

34 of 221

Cyberbullying on the Instagram Social Network

  • Is an image bully–prone?
  • Features
    • Text: BOW, Offensiveness (dependency parse+dictionary), Word2Vec.
    • Image
      • SIFT, color histogram, GIST (captures naturalness, openness, roughness, expansion, and ruggedness, i.e., the spatial structure of a scene.)
      • CNN-Cl: Clustering results on 1000*1900 activation matrix from AlexNet for 1900 images.
      • Captions: LDA with 50 topics.
  • User: number of posts; followed-by; replies to this post; average total replies per follower.

Zhong, H., Li, H., Squicciarini, A.C., Rajtmajer, S.M., Griffin, C., Miller, D.J., Caragea, C.:Content-driven detection of cyberbullying on the instagram social network. In: IJCAI. vol. 16,pp. 3952–3958 (2016)

Classification results using SVM with an RBF kernel, given various (concatenated) feature sets. BoW=Bag of Words; OFF=Offensiveness score; Captions=LDA-generated topics from image captions;

CNN-Cl=Clusters generated from outputs of a pre-trained CNN over images

35 of 221

Unsupervised cyberbullying detection

Cheng, L., Shu, K., Wu, S., Silva, Y.N., Hall, D.L., Liu, H.: Unsupervised cyberbullying detection via time-informed gaussian mixture model. In: Proc. of the 29th ACM Intl. Conf. on Information & Knowledge Management. pp. 185–194 (2020)

36 of 221

Unsupervised cyberbullying detection

  • UCDXtext. UCD without HAN.
  • UCDXtime. UCD without time interval prediction.
  • UCDXgraph. UCD without GAE.
  • UCD achieves the best performance in Recall, F1,

AUROC, and competitive Precision compared to the unsupervised baselines for both datasets.

Cheng, L., Shu, K., Wu, S., Silva, Y.N., Hall, D.L., Liu, H.: Unsupervised cyberbullying detection via time-informed gaussian mixture model. In: Proc. of the 29th ACM Intl. Conf. on Information & Knowledge Management. pp. 185–194 (2020)

37 of 221

Multimodal Twitter: MMHS150K

  • We find that even though images are useful

for the hate speech detection task, current

multimodal models cannot outperform models analyzing only text.

  • Unimodal
    • Images: Imagenet pre-trained Google Inception v3 features
    • Tweet Text: 1-layer 150D LSTM using 100D GloVe.
    • Image Text: from Google Vision API Text

Detection module. 1-layer 150D LSTM using 100D GloVe.

  • Multimodal
    • CNN+RNN models with three inputs: tweet image, tweet text and image text
      • Feature Concatenation Model (FCM)
      • Spatial Concatenation Model (SCM)
      • Textual Kernels Model (TKM)

Gomez, R., Gibert, J., Gomez, L., Karatzas, D.: Exploring hate speech detection in multi-modal publications. WACV. pp. 1470–1478 (2020)

38 of 221

Multimodal Twitter: MMHS150K

Gomez, R., Gibert, J., Gomez, L., Karatzas, D.: Exploring hate speech detection in multi-modal publications. WACV. pp. 1470–1478 (2020)

39 of 221

Hateful Memes Challenge

Kiela, D., Firooz, H., Mohan, A., Goswami, V., Singh, A., Ringshia, P., Testuggine, D.: The hateful memes challenge: Detecting hate speech in multimodal memes. Advances in Neural Information Processing Systems33(2020)

  • Multi-modal hate: benign confounders were found for both modalities
  • unimodal hate: one or both modalities were already hateful on their own
  • benign image and benign text confounders
  • random not-hateful examples

40 of 221

Hateful Memes Challenge

  • Image encoders
    • Image-Grid: standard ResNet-152 from res-5c with average pooling
    • Image Region: fc6 layer of Faster-RCNN with ResNeXt152 backbone
  • Text encoder: BERT
  • Multimodal
    • Late Fusion: mean of ResNet-152 and BERT output
    • ConcatBERT: concat ResNet-152 features with BERT and training an MLP on top
    • MMBT-Grid and MMBT-Region: Supervised

multimodal bi-transformers using Image-Grid/Image-Region

    • ViLBERT, Visual BERT that were only

unimodally pretrained or pretrained on multimodal data

Kiela, D., Firooz, H., Mohan, A., Goswami, V., Singh, A., Ringshia, P., Testuggine, D.: The hateful memes challenge: Detecting hate speech in multimodal memes. Advances in Neural Information Processing Systems33(2020)

  • Text-only classifier performs slightly better than the vision-only classifier.
  • The multimodal models do better

41 of 221

Multi-modal hate speech detection

Das, A., Wahi, J.S., Li, S.: Detecting hate speech in multi-modal memes. arXiv preprint arXiv:2012.14891 (2020)

Fine tune Visual Bert and BERT on Facebook hateful dataset and the captions generated on images of the Facebook hateful dataset.

RoBERTa for text encoding. VGG for visual sentiments.

42 of 221

Agenda

  • Why is hate speech detection important?
  • Hate speech datasets
  • Feature based approaches
  • Deep learning methods
  • Multimodal hate speech detection
  • Challenges and limitations

43 of 221

Challenges

  • Low agreement in hate speech classification by humans, indicating that this classification would be harder for machines
    • The task requires expertise about culture and social structure
  • The evolution of social phenomena and language makes it difficult to track all racial and minority insults
    • Language evolves quickly, in particular among young populations that communicate frequently in social networks
    • Some insults which might be unacceptable to one group may be totally fine to another group, and thus the context of the blacklist word is all important
  • Abusive language may be very fluent and grammatically correct, can cross sentence boundaries, and the use of sarcasm in it is also common
  • Hate speech detection is more than simple keyword spotting
    • Obfuscations such as ni99er, whoopiuglyniggerratgolberg and JOOZ make it impossible for simple keyword spotting metrics to be successful, especially as there are many permutations to a source word or phrase.

Fortuna, P., Nunes, S.: A survey on automatic detection of hate speech in text. ACM Computing Surveys (CSUR)51(4), 1–30 (2018)

Nobata, C., Tetreault, J., Thomas, A., Mehdad, Y., Chang, Y.: Abusive language detection in online user content. In: Proc. of the 25th Intl. Conf. on world wide web. pp. 145–153 (2016)

44 of 221

Limitations of existing methods

  • Interpretability: Systems that automatically censor a person’s speech likely need a manual appeal process.
  • Circumvention
    • Those seeking to spread hateful content actively try to find ways to circumvent measures put in place.
    • E.g., posting the content as images containing the text, rather than the text itself.

MacAvaney, S., Yao, H.R., Yang, E., Russell, K., Goharian, N., Frieder, O.: Hate speech detection: Challenges and solutions. PloS one14(8), e0221152 (2019)

45 of 221

Thanks Q&A

46 of 221

SLOT-II

47 of 221

Agenda

  • Revisiting Meta Data Context for Hate Detection
  • Inter and Intra User Context for Hate Detection
  • Network Characteristics of Hateful Users
  • Diffusion Modeling of Hateful Text
  • Predicting Spread of Hate among Retweeters
  • Predicting Spread of Hate among Replies

48 of 221

Some Interesting observations

Table 1:

Table 2:

Table 3:

  • Table 1: Hatefulness of different users towards different hashtags. (RETINA) [1]
  • Table 2: Hatefulness of reply threads overtime. (DESSRt) [2]
  • Table 3: Hatefulness of reply threads of coeval topics. (DRAGNET) [3]

[1]: Hate is the New Infodemic: A Topic-aware Modeling of Hate Speech Diffusion on Twitter: https://arxiv.org/pdf/2010.04377.pdf

[2]: Would Your Tweet Invoke Hate on the Fly? Forecasting Hate Intensity of Reply Threads on Twitter: https://dl.acm.org/doi/10.1145/3447548.3467150

[3]: Better Prevent than React: Deep Stratified Learning to Predict Hate Intensity of Twitter Reply Chains: https://ieeexplore.ieee.org/document/9679052

49 of 221

Revisiting Metadata and Network Context

  • Content based:
    • Number of hashtags, mentions
    • Number of words in uppercase
    • Sentiment scores: overall and emotion specific
  • Network based:
    • Number of followers, friends
    • The user’s network position, i.e., hub, centrality, authority, clustering coefficient
  • User based:
    • Number of posts, favorited tweets, subscribed lists
    • Age of account

A Unified Deep Learning Architecture for Abuse Detection: https://arxiv.org/abs/1802.00385

50 of 221

Inter and Intra user history context

  • Intra-user representation: User History/timeline.
  • Inter-user representation: Set of semantically similar tweets in the corpus.
  • Adding intra-user attributes reduces false positives.
  • This study shows that the user’s network and timeline activity play a major in the generation and spread of hate speech. Only using textual attributes are not sufficient to create a detection model for social media.

Leveraging Intra-User and Inter-User Representation Learning for Automated Hate Speech Detection: https://aclanthology.org/N18-2019.pdf

51 of 221

Network Characteristics of Hateful Users

  • Source: Twitter
  • A sampled retweet graph with 100k users and 2.2k retweet edges along with 200 most recent tweets of each user.
  • Transition matrix capturing how a user is influenced by the users he/she retweets.
  • Initiate a hateful vector p0 = 1 if the ith user employed any hateful word from the sampled lexicon, else zero.
  • Generated the overall hatefulness of a user based on user’s profile and profile of the people they follow, converging to p where: pt ~ pt-1
  • Divide the users into 4 strata of potentially hatefulness based on p intervals [0, .25), [.25, 0.50), [0.50,0.75) and [0.75, 1]

Characterizing and Detecting Hateful Users on Twitter: https://arxiv.org/pdf/1803.08977.pdf

52 of 221

Network Characteristics of Hateful Users

  • Hateful users & their neighbours tend to have newer account.
  • Hateful users & their neighbours tend to tweet more and in short intervals, follow more.
  • Hateful users & their neighbours are more “central”/ densely connected together.
  • Hateful users & their neighbours use more profane words but less words that express emotions such as shame or sadness.
  • Interestingly, the authors concluded that hateful users do not behave like spammers based on hashtag and URL usage.

Characterizing and Detecting Hateful Users on Twitter: https://arxiv.org/pdf/1803.08977.pdf

53 of 221

Diffusion and User Modeling of Hateful Text

  • Source: GAB as it promotes “free speech” : 21M posts by 341K users between Oct 16 and June 18
  • Network Level Features
    • Follower-followee network (61.1k nodes and 156.1k edges)
  • User Level Features
    • # posts, likes, dislikes, reply, repost
    • # Profile score
    • Ratio of Follower - followee
  • They curated their own list of hateful lexicons.
  • Initial hateful users were enlisted based on users who are active and have used at least one term from the hate lexicon.
  • Label 1.5k users as hateful (0.3%)

Spread of hate speech in online social media: https://arxiv.org/abs/1812.01693

54 of 221

Diffusion and User Modeling of Hateful Text

  • The posts of hateful users diffuse significantly farther, wider, deeper and faster than non-hateful ones.
  • Hateful users are more proactive and cohesive (denser subnetwork for hateful users).
  • Based on the above observations one can say that hateful users are more influential.
  • Based on the number of posts generated it was observed that the 0.3% of hateful users are responsible for around 18% of the content.

Spread of hate speech in online social media: https://arxiv.org/abs/1812.01693

KH: Known hateful users and NH: Non hateful users

55 of 221

Additional Studies

  1. Examining Untempered Social Media: Analyzing Cascades of Polarized Conversations (Gab) [1]
    1. Stronger ties between users who engage on each other’s post related to controversial and hateful topics.
    2. Most information cascades start in a linear fashion, but end up branched which is a sign of spread of controversy in Gab
  2. Measuring #GamerGate: A Tale of Hate, Sexism, and Bullying on Twitter [2]
    • Study users involved in #gamergate vs random users.
    • Users spreading hate/harassment tend to use more hashtags, but more likely to use @ to either incite their peers or directly attack their counterparts.
    • Tend to have more followers & followee.
    • 25% of their tweets are negative in sentiment(compared to 15% for negative users). Their avg. offense score based on HateBase lexicon is 0.25(0.06 for random users)
  1. : Examining Untempered Social Media: Analyzing Cascades of Polarized Conversations (Gab): https://www.computer.org/csdl/proceedings-article/asonam/2019/09072961/1jjAcsAe3zG
  2. : Measuring #GamerGate: A Tale of Hate, Sexism, and Bullying on Twitter https://arxiv.org/abs/1702.07784

56 of 221

Limitations of Existing Diffusion Analysis

  • Only exploratory analysis of users, hashtags or posts.
  • Consider the hate, non-hate to be separate groups, read-world is more fuzzy.
  • Cascade models do not take content into account, only who follows whom.

57 of 221

Hate Diffusion on Tweet Retweets

Hate is the New Infodemic: A Topic-aware Modeling of Hate Speech Diffusion on Twitter: https://arxiv.org/pdf/2010.04377.pdf

58 of 221

Hate Diffusion on Tweet Retweets

  • User history-based features
    • N-grams (n=1,2) features of tf-idf
    • Hate lexicon vector (length = 209)
    • Hate tweets/ Non-hate tweets
    • Hate tweet retweeters/ Non-hate tweet retweeters
    • Follower Count
    • Account Creation Date
    • No. of topics on which the user has tweeted
  • Topic (hashtag)-oriented feature
    • Cosine similarity (tweet text and hashtag)
  • Non-peer endogenous features
  • Exogenous feature (News crawled)

Hate is the New Infodemic: A Topic-aware Modeling of Hate Speech Diffusion on Twitter: https://arxiv.org/pdf/2010.04377.pdf)

59 of 221

Hate Diffusion on Tweet Retweets: RETINA model

a) Exogenous attention

b) Static Retweet prediction Model

c) Dynamic Retweet Prediction Model

Hate is the New Infodemic: A Topic-aware Modeling of Hate Speech Diffusion on Twitter: https://arxiv.org/pdf/2010.04377.pdf

60 of 221

Hate Diffusion on Tweet Retweets: RETINA model

Signify models without exogenous influence

Hate is the New Infodemic: A Topic-aware Modeling of Hate Speech Diffusion on Twitter: https://arxiv.org/pdf/2010.04377.pdf

Fig1

Fig2

61 of 221

Hate Diffusion on Tweet Replies

  • Curated 4k source tweets and ~ 200 reply threads.
  • Hate intensity is a combination of classifier and lexicon based approach.
  • No generic pattern emerges.

Would Your Tweet Invoke Hate on the Fly? Forecasting Hate Intensity of Reply Threads on Twitter: https://dl.acm.org/doi/10.1145/3447548.3467150

62 of 221

Hate Diffusion on Tweet Replies: DESSRt Model

Would Your Tweet Invoke Hate on the Fly? Forecasting Hate Intensity of Reply Threads on Twitter: https://dl.acm.org/doi/10.1145/3447548.3467150

63 of 221

Hate Diffusion on Tweet Replies: DESSRt Model

Model shows consistent performance irrespective of the type of source user and source tweet.

Would Your Tweet Invoke Hate on the Fly? Forecasting Hate Intensity of Reply Threads on Twitter: https://dl.acm.org/doi/10.1145/3447548.3467150

Fig: 1

Fig: 2

64 of 221

Hate Diffusion on Tweet Replies: DRAGNET model

Better Prevent than React: Deep Stratified Learning to Predict Hate Intensity of Twitter Reply Chains: https://ieeexplore.ieee.org/document/9679052

65 of 221

Hate Diffusion on Tweet Replies: DRAGNET model

Better Prevent than React: Deep Stratified Learning to Predict Hate Intensity of Twitter Reply Chains: https://ieeexplore.ieee.org/document/9679052

66 of 221

Hate Diffusion on Tweet Replies: DRAGNET model

Better Prevent than React: Deep Stratified Learning to Predict Hate Intensity of Twitter Reply Chains: https://ieeexplore.ieee.org/document/9679052

67 of 221

Real-World Deployments of Hate Diffusion Models

  • RETINA mode being deployed as a part of the HELIOS (Hate, Hyperpartisan, and Hyperpluralism Elicitation and Observer System) in collaboration with IITP, UT Austin and Wipro AI.
    • Offline Model
  • DESSERt and DRAGNET models are being deployed as a part of a partnership with Logically.
    • On the fly predictions

68 of 221

Limitations and Future Scope

  • Scrapping large datasets and large networks from social media sites has API constraints.
  • Large scale annotation of hate speech datasets requires some form of training of the annotators and can be costly for non-english languages.
  • Use of hate lexicons in the hate diffusion models can restrict the learning ability of the models to

capture dynamic/ever-changing forms of hate.

  • Most diffusion analysis focuses on hateful text content while other modalities remain undiscovered.
  • In certain context there seem to be a relation between spread of fake news/rumors and an

increase in hateful behaviour online/offline. Capturing such inter-domain knowledge can help in early detection of hateful content.

  • Hate speech detection underpins hate diffusion and network analysis. Thus, limitations of detection overflow into diffusion.

69 of 221

Thanks Q&A

70 of 221

  • Slot I: (60 mins)
    • Introduction: 20 mins (Tanmoy)
    • Hate Speech Detection: 30 mins (Manish)
    • Questions: (10 mins)
  • Slot II: (50 mins)
    • Hate Speech Diffusion: 40 mins (Sarah)
    • Questions: (10 mins)

Break (5 mins)

  • Slot III: (60 mins)
    • Psychological Analysis of Hate Spreaders: 25 mins (Amitava)
    • Intervention Measures for Hate Speech: 25 mins (Sarah)
    • Questions: (10 mins)
  • Slot IV: (65 mins)
    • Overview of Bias in Hate Speech: 25 mins (Pinkesh)
    • Current Developments: 25 mins (Preslav)
    • Future Scope & Concluding Remarks: 5 mins (Tanmoy/Sarah)
    • Questions: (10 mins)

71 of 221

SLOT-III

72 of 221

Psychological Analysis of Online Hate Spreaders

Amitava Das

73 of 221

Agenda

  • Psychological Analysis of Online Hate Spreader
    • Personality Models
    • Value Models
    • Empathy Models
    • Confirmation Bias
  • Intervention Strategy
    • Data Collection for Intervention
    • Reactive vs Proactive Stragtegy
    • Dynamics of Hate and Counter Speech Online.

74 of 221

75 of 221

76 of 221

77 of 221

78 of 221

79 of 221

80 of 221

81 of 221

82 of 221

83 of 221

84 of 221

85 of 221

86 of 221

87 of 221

88 of 221

89 of 221

90 of 221

91 of 221

92 of 221

93 of 221

94 of 221

95 of 221

96 of 221

97 of 221

98 of 221

99 of 221

100 of 221

101 of 221

102 of 221

103 of 221

104 of 221

105 of 221

106 of 221

107 of 221

108 of 221

109 of 221

110 of 221

111 of 221

112 of 221

113 of 221

114 of 221

115 of 221

116 of 221

117 of 221

118 of 221

119 of 221

120 of 221

121 of 221

122 of 221

123 of 221

124 of 221

125 of 221

126 of 221

127 of 221

128 of 221

129 of 221

130 of 221

131 of 221

132 of 221

133 of 221

134 of 221

135 of 221

136 of 221

137 of 221

138 of 221

139 of 221

140 of 221

141 of 221

142 of 221

143 of 221

Intervention Strategies for Online Hate

Sarah Masud

144 of 221

Agenda

  • Psychological Analysis of Online Hate Spreader
    • Personality Models
    • Value Models
    • Empathy Models
    • Confirmation Bias
  • Intervention Strategy
    • Data Collection for Intervention
    • Reactive vs Proactive Strategy
    • Dynamics of Hate and Counter Speech Online.

145 of 221

Countering Hateful Content on Social Media

  • Reactive countering: When a hateful post has been made and we are intervening to prevent it further spreading.
    • Warn the user who has posted.
    • Generate a text that counters the existing hate.
    • Report/flag block the users.
    • Ask influential members of the community to help spread the counter narrative.
  • Proactive countering: Intervene before the post goes public
    • Prompt based detection of offensive words and phrases in a sentence.
    • Approval of content via group moderator especially for past instigators.

146 of 221

Data Collection Strategy for Counter Narration

  • CRAWL: (Real-world samples of both hate and counter-hate)
  • CROWD: (Real-world samples of hate and synthetic samples of counter-hate)
  • NICHE: (Synthetic samples of both hate and counter-hate)

Generating Counter Narratives against Online Hate Speech: Data and Strategies: https://arxiv.org/pdf/2004.04216.pdf

Table 1: Characteristics of collection methods

147 of 221

Analyzing the hate and counter speech accounts on Twitter

  • Obtain a dataset via the crawling strategy from a template based approach to capture hate.
  • Post annotation: 558 unique hate tweets from 548 user and 1290 counterspeech replies from 1239 users.

Analyzing the hate and counter speech accounts on Twitter: https://arxiv.org/pdf/1812.02712.pdf

148 of 221

Analyzing the hate and counter speech accounts on Twitter

  • Hateful accounts tend to express more negative sentiment and profanity in general.
  • Based on the B5 personality, hateful users tend to be more extroverted meanwhile counter hateful users tend to be more conscious and open-minded.
  • Another intriguing finding is that hateful users also act as counterspeech users in some situations. In our dataset, such users use hostile language as a counterspeech measure 55% of the times.
  • Different target communities adopt different measures to respond to the hateful tweet.
  • These lexical, network and emotion features in user’s timeline can be used to distinguish counter hate accounts, and policies can promote their content instead.

Table 1

Table 2

Analyzing the hate and counter speech accounts on Twitter: https://arxiv.org/pdf/1812.02712.pdf

149 of 221

CONAN: Multilingual Parallel Counter Dataset

(NICHE Sourcing)

  • For language EN, FR, IT:
    • NGO operators generate prototypical Islamophoic hate speech samples (2 native speakers per language).
    • NGO operators generate counter narrative samples.
    • Another set of non-expert crowdworkers perform fine-grained labelling of hate and counter hate samples.
      • Paraphrasing and translation also performed.

CONAN - COunter NArratives through Nichesourcing: a Multilingual Dataset of Responses to Fight Online Hate Speech: https://arxiv.org/pdf/1910.03270.pdf

150 of 221

CONAN: Multilingual Parallel Counter Dataset

(NICHE Sourcing)

Fine-grained Hate Class

  • Culture
  • Economics
  • Crimes
  • Rapism
  • Terrorism
  • Women
  • History
  • Others

Fine-grained Counter-Hate Class

  • Affiliation
  • Denouncing
  • Facts
  • Humour
  • Hypocrisy
  • Negative
  • Positive
  • Question
  • Consequences
  • Others

CONAN - COunter NArratives through Nichesourcing: a Multilingual Dataset of Responses to Fight Online Hate Speech: https://arxiv.org/pdf/1910.03270.pdf

Pointers adopted by NGO for counter-narrative generation

  • Objectiveness
  • Non-abusive
  • Recall influential users and profiles
  • Credibility of facts
  • Supportive tone for vulnerable groups

151 of 221

Author-Reviewer Architecture

  • Author generates the HS-CN pairs (Manual or Machine)
  • Reviewers review the generated pairs for consistency and diversity of content. (Manual or Machine)
  • Validators make final grammatical edits and accept/reject samples. (Manual)

Generating Counter Narratives against Online Hate Speech: Data and Strategies: https://arxiv.org/pdf/2004.04216.pdf :

152 of 221

Author-Reviewer Architecture

Authoring via machine generated counter text

Reviewing via machine classification of HS-CN pairs

Manual Validation

END

START

Generating Counter Narratives against Online Hate Speech: Data and Strategies: https://arxiv.org/pdf/2004.04216.pdf :

153 of 221

Empathy based Counter Speech

  • Field test 3 popular counter narrative strategies - humour, warning and inducing empathy. Here the narrative empathises with the victim.
  • Field experiment ran on Twitter from Nov 2020 to Jan 2021.
  • Xenophobic tweets we curated yielding from 65-115 xenophobic tweets per day.
  • The users of the above tweets were randomly assigned to one of the 3 treatment variants and one control group that received no intervention.
  • During intervention public response were generated as a counter narrative from non-bot accounts. There accounts were created for the purpose of field experiments.

Empathy-based counterspeech can reduce racist hate speech in a social media field experiment: https://www.pnas.org/content/118/50/e2116310118

154 of 221

Empathy based Counter Speech

  • One interviewed the instigator accounts were monitored for a period of 6 weeks.
    • It was observed for deletion of past xenophopic tweets.
    • Future creation of exnophobic tweets.
    • Vedar sentiment score for negative sentiment of tweets in the follow up period.
  • It was observed that empathy based countering.

155 of 221

Proactive Strategies

  • Subreddit content moderation (threads can be marked as flagged as offensive by the moderators. [1]
  • Facebook Groups: Posting and commenting only by approval of moderators.
  • Social media platforms like Twitter, Facebook appoint content moderators to examine flagged and potentially harmful content.
  • However regular monitoring of such content can be stressful for humans [2].
    • Make sure of semi-automatic flagging of content.
  1. : https://www.wired.com/story/the-punishing-ecstasy-of-being-a-reddit-moderator/
  2. : https://www.theverge.com/2019/2/25/18229714/cognizant-facebook-content-moderator-interviews-trauma-working-conditions-arizona

156 of 221

Offensive to Non-Offensive Unsupervised Style Transfer

Si and Sj represent the two styles: offensive and non-offensive. Unsupervised method, uses non-labeled/parallel corpus.

Fighting Offensive Language on Social Media with Unsupervised Text Style Transfer: https://arxiv.org/pdf/1805.07685.pdf

157 of 221

Reconsidering Tweets: Intervening During Tweet Creation

  • A study conducted by researchers at Twitter to study an intervention to an offensive reply before it is posted.
  • Study ran from Feb-April 2021.
  • 200k users enrolled in the study. 50% randomly assigned to the control group.
  • H1: Are prompted users less likely to post the current offensive content.
  • H2: Are prompted users less likely to post offensive content in future.
  • H3: Are prompted users less likely to receive engagement on their content.
  • H4: Additionally, as a consequence of reconsidering are prompted users less like to delete their seemingly offensive past tweets in the future.

Reconsidering Tweets: Intervening During Tweet Creation Decreases Offensive Content https://arxiv.org/abs/2112.00773

158 of 221

Reconsidering Tweets: Intervening During Tweet Creation

  • H1: Are prompted users less likely to post the current offensive content
    • Yes, observed a decrease in the number of offensive replies sent post intervention.
    • No significant change in total replies sent, thus prompting does not impact non-offensive interactions.
  • H4: Additionally, as a consequence of reconsidering are prompted users less like to delete their seemingly offensive past tweets in the future.
    • Found no significant evidence in support of this.
  • H3: Are prompted users less likely to receive engagement on their content.
    • Yes, as Hate begets Hate, prompting users to reconsider and an actual reconsideration reduces the overall offensiveness of a thread and thereby the offensiveness of further engagement on the thread.
  • H2: Are prompted users less likely to post offensive content in future.
    • Yes, prompted users less likely to produced prompt-eligible tweets in future, and hopefully remain cognizant.

Reconsidering Tweets: Intervening During Tweet Creation Decreases Offensive Content https://arxiv.org/abs/2112.00773

159 of 221

Thanks Q&A

160 of 221

SLOT-IV

161 of 221

Agenda

  • Analysis of Bias in Hate Speech Detection
    • Data bias
    • Model bias
    • Other types of bias
    • Mitigation Strategies
  • Current Direction and Future Scope
    • Fine-grained hate speech classification
    • Exploring Zero and Few shot learning
    • Cross Lingual and Multilingual Hate Detection
    • Limits of existing few shot modeling for Multilinguality
    • Key Takeaways and Future Scope

162 of 221

Analysis of Bias in Hate Speech Detection

Pinkesh Badjatiya

163 of 221

Agenda

  • What is bias in the context of hate speech?
  • Source of bias
  • Societal Impact of biased predictions
  • Mitigating biases in learning
  • Challenges and Limitations

164 of 221

Definition

  • Bias is an error from erroneous assumptions in the learning algorithm.
    • Could be due to errors in the learning algorithm or the data.
  • Stereotypical Bias (SB): In social psychology, a stereotype is an over-generalized belief about a particular category of people.
    • In the context of hate speech, we define SB as an over-generalized belief about a word being Hateful or Neutral.
    • For Example – attributing the word muslim to hate/violence
  • Stereotypical Bias can be based on typical perspectives like skin tone, gender, race, demography, disability, Arab-Muslim background, etc.
    • It can be a complicated combinations of these as well as other confounding factors

165 of 221

Why does a model learn these biases?

  • Training from data
    • Using datasets
      • Ex. Twitter, Facebook, Reddit, Washington Post Comments, etc
    • Conversations on the Internet
  • All conversations are biased, so any model we learn will pickup that bias

  • Annotation Quality Check can be used to control the bias in training dataset, but its impossible to remove it completely, especially when training at scale.

How to Learn an unbiased model from biased conversations ?

166 of 221

Impact of biased predictions

  • Not being able to build unbiased prediction systems can lead to low-quality unfair results for victim communities.
  • This unfairness can propagate into government/organizational policy making

Examples of Incorrect predictions from Google’s Perspective API

(as on 15th Aug 2018)

Examples

Predicted Hate Label (Score)

Those guys are nerds

Hateful (0.83)

Can you throw that garbage please

Hateful (0.74)

People will die if they kill Obamacare

Hateful (0.78)

Oh shit. I did that mistake again

Hateful (0.91)

that arab killed the plants

Hateful (0.87)

I support gay marriage. I believe they have a might to be as miserable as the rest of us.

Hateful (0.77)

167 of 221

Mitigating Bias in Learning

Goal:

Model is fair towards all the ethnic groups, minorities and gender

Bias from social media is not learnt

168 of 221

Choices for Bias Mitigation

Statistical Correction: Includes techniques that attempt to uniformly distribute the samples of every kind in all the target classes, altering the train set with samples to balance the term usage across the classes.

Example: Strategic Sampling, Data Augmentation

Ex. This is a hateful sentence for muslim

Ex. This is a hateful sentence for muslim

Ex. This is NOT a hateful sentence for muslim

🡪 +ve

🡪 -ve

Limitations: Not always possible to create balanced samples for all the keywords

169 of 221

Choices for Bias Mitigation

Statistical Correction:

Example: Adversarial Filters of Dataset Biases (Bras et al. (2020), ICML 2020)

An iterative greedy algorithm that can adversarially filter the biases from the training dataset

De-biased Version of Dataset

170 of 221

Choices for Bias Mitigation

Model Correction: Make changes to the model like modifying word embeddings or debiasing during model training

Example: Ensemble Learning

Model 2

Model 1

Model 3

Ensemble of black-box Models

Black-box models

171 of 221

Choices for Bias Mitigation

Model Correction: Make changes to the model like modifying word embeddings or debiasing during model training

Example: Adversarial Learning (Xia et al. (2020))

Limitations: Need labels for all the private attributes that we want to correct

Model

Input Sentence

Private Attributes

Ex. Gender

GRL

Model learns to identify hatespeech and gender

but NOT the gender

Gradient Reversal Layer

Hateful ?

172 of 221

Choices for Bias Mitigation

Model Correction:

Example: Statistical Model re-weighing (Utama et al. (2020))

An input example that contains lexical-overlap bias is predicted as entailment by the teacher model with a high confidence. When biased model predicts this example well, the output distribution of the teacher will be re-scaled to indicate higher uncertainty (lower confidence). The re-scaled output distributions are then used to distill the main model

173 of 221

Choices for Bias Mitigation

Data Correction: Focuses on converting the samples to a simpler form by reducing the amount of information available to the classifier during learning-stage.

Example: Private-attribute masking, Knowledge generalization (Badjatiya et al., 2019)

Ex. This is a hateful sentence for muslim

Ex. This is a hateful sentence for ########

🡪 Can we do better?

174 of 221

Choices for Bias Mitigation

  • Replacing with Part-of-speech (POS) tags
    • Example: Muhammad set the example for his followers, and his example shows him to be a cold-blooded murderer.
    • Replace the word ‘Muhammad’ with POS tag ‘<NOUN>’
  • Replacing with Named-entity (NE) tags
    • Example: Mohan is a rock star of Hollywood
    • Replace the entities with tags <PERSON> and <ORGANIZATION> respectively
  • Replacing with WordNet generalizations (Badjatiya et al., 2019)

175 of 221

Knowledge-based Generalizations

WordNet Hierarchy

176 of 221

Challenges and Limitations

  • Problem still not solved, bias is prominent in almost all the learning algorithms
  • Nearly impossible to mitigate all the biases
  • Need automated mitigation techniques that work at scale, as biases could be based on unknown attributes

177 of 221

Current Trends: Hate Speech Keeping Up with NLP

Preslav Nakov

178 of 221

Multi-Class Datasets

  • Classical Binary classification
    • Hate vs Non-hate
  • Waseem
    • Racism, Sexism, Neither
  • Davidson
    • Hate, Offense, Neither
  • Fountana
    • Hate, Abuse, Spam, None
  • Kaggle Toxicity Challenge
    • Toxic, Severe Toxic, Obscene, Threat, Insult, Identity Hate
    • Ethnicity-based labels including [female, christian, muslim, white, black, homosexual, asian, jewish, transgender].
  • OffensEval
    • 3-level schema

179 of 221

179

SemEval-2019 Task 6: Identifying and Categorizing Offensive Language in Social Media (OffensEval).

Marcos Zampieri, Shervin Malmasi, Preslav Nakov, Sara Rosenthal, Noura Farra, Ritesh Kumar:

180 of 221

The OLID Hierarchy

180

180

NAACL-HLT'2019: Predicting the Type and Target of Offensive Posts in Social Media.

Marcos Zampieri, Shervin Malmasi, Preslav Nakov, Sara Rosenthal, Noura Farra, Ritesh Kumar

181 of 221

The OLID Hierarchy

181

181

NAACL-HLT'2019: Predicting the Type and Target of Offensive Posts in Social Media.

Marcos Zampieri, Shervin Malmasi, Preslav Nakov, Sara Rosenthal, Noura Farra, Ritesh Kumar

182 of 221

The OLID Hierarchy

182

182

NAACL-HLT'2019: Predicting the Type and Target of Offensive Posts in Social Media.

Marcos Zampieri, Shervin Malmasi, Preslav Nakov, Sara Rosenthal, Noura Farra, Ritesh Kumar

183 of 221

The OLID Hierarchy

183

183

NAACL-HLT'2019: Predicting the Type and Target of Offensive Posts in Social Media.

Marcos Zampieri, Shervin Malmasi, Preslav Nakov, Sara Rosenthal, Noura Farra, Ritesh Kumar

184 of 221

Fine-Grained Hate Speech: OLID Dataset

  • Dataset presented as the official dataset for OffensEval 2019.
  • Crowdsourced Hierarchical Annotation of Tweet Texts

--------- Level A (Content Type): Offensive, Non-Offensive

--------- --------- Level B (Offense Type): Targeted, Untargeted

--------- --------- --------- Level C (Target Type): Individual, Group, Others

https://aclanthology.org/N19-1144/

NAACL-HLT'2019: Predicting the Type and Target of Offensive Posts in Social Media.

Marcos Zampieri, Shervin Malmasi, Preslav Nakov, Sara Rosenthal, Noura Farra, Ritesh Kumar

185 of 221

Fine-Grained Hate Speech: OLID Dataset

Level A

https://aclanthology.org/N19-1144/

  • CNN bases approach work best across all 3 tasks.
  • All training is done separately.
  • Performance reduction moving from more coarse-grained to fine-grained samples.

NAACL-HLT'2019: Predicting the Type and Target of Offensive Posts in Social Media.

Marcos Zampieri, Shervin Malmasi, Preslav Nakov, Sara Rosenthal, Noura Farra, Ritesh Kumar

186 of 221

Fine-Grained Hate Speech: OLID Dataset

Level C

https://aclanthology.org/N19-1144/

Level B

NAACL-HLT'2019: Predicting the Type and Target of Offensive Posts in Social Media.

Marcos Zampieri, Shervin Malmasi, Preslav Nakov, Sara Rosenthal, Noura Farra, Ritesh Kumar

187 of 221

SOLID: Semi-Supervised Extension

Used heterogeneous machine learning models that have diverse inductive biases:

    • PMI
    • FastText
    • LSTM
    • BERT

Then, applied democratic co-training to generate semi-supervised labels using OLID as a seed dataset and distant supervision using the ensemble of models above.

187

187

ACL-2021 (Findings): SOLID: A Large-Scale Semi-Supervised Dataset for Offensive Language Identification.

Sara Rosenthal, Pepa Atanasova, Georgi Karadzhov, Marcos Zampieri, Preslav Nakov

188 of 221

Data Statistics

188

ACL-2021 (Findings): SOLID: A Large-Scale Semi-Supervised Dataset for Offensive Language Identification.

Sara Rosenthal, Pepa Atanasova, Georgi Karadzhov, Marcos Zampieri, Preslav Nakov

188

189 of 221

The Impact of Adding SOLID

189

ACL-2021 (Findings): SOLID: A Large-Scale Semi-Supervised Dataset for Offensive Language Identification.

Sara Rosenthal, Pepa Atanasova, Georgi Karadzhov, Marcos Zampieri, Preslav Nakov

189

190 of 221

190

SemEval-2020 Task 12: Multilingual Offensive Language Identification in Social Media (OffensEval 2020).

Marcos Zampieri, Preslav Nakov, Sara Rosenthal, Pepa Atanasova, Georgi Karadzhov, Hamdy Mubarak, Leon Derczynski, Zeses Pitenis, Çagri Çöltekin

: Multilingual Offensive Language Identification in Social Media

191 of 221

OffensEval 2020: Level A (all languages)

191

SemEval-2020 Task 12: Multilingual Offensive Language Identification in Social Media (OffensEval 2020).

Marcos Zampieri, Preslav Nakov, Sara Rosenthal, Pepa Atanasova, Georgi Karadzhov, Hamdy Mubarak, Leon Derczynski, Zeses Pitenis, Çagri Çöltekin

192 of 221

Neighborhood-Based Content Flagging

192

TACL (2022): A Neighbourhood Framework for Resource-Lean Content Flagging.

Sheikh Muhammad Sarwar, Dimitrina Zlatkova, Momchil Hardalov, Yoan Dinkov, Isabelle Augenstein, Preslav Nakov

193 of 221

Neighborhood-Based Content Flagging

193

Neighborhood Representation

Flagged/Non-Flagged

Abusive

Benign

Abusive

Entail

Contradict

Benign

Contradict

Entail

Query

Neighbor

TACL (2022): A Neighbourhood Framework for Resource-Lean Content Flagging.

Sheikh Muhammad Sarwar, Dimitrina Zlatkova, Momchil Hardalov, Yoan Dinkov, Isabelle Augenstein, Preslav Nakov

194 of 221

Neighborhood-Based Content Flagging

194

TACL (2022): A Neighbourhood Framework for Resource-Lean Content Flagging.

Sheikh Muhammad Sarwar, Dimitrina Zlatkova, Momchil Hardalov, Yoan Dinkov, Isabelle Augenstein, Preslav Nakov

195 of 221

HateBERT

  • BERT, trained on RAL-E
    • the Reddit Abusive Language English dataset
    • potentially harmful content from banned or controversial Reddit communities. (1M+ messages)
  • Re-trained BERT-base with MLM

WOAH@ACL-2021: HateBERT: Retraining BERT for Abusive Language Detection in English

Tommaso Caselli, Valerio Basile, Jelena Mitrović, Michael Granitzer

196 of 221

Hate Speech Detection Using GTP-3 Prompts

Zero-Shot

https://beta.openai.com/playground/p/BjTry9NqZqLebA nYnRmnuD57?model=davinci

One-shot

https://beta.openai.com/playground/p/QcqZSdfFPCei0ae 5ePJkK1va?model=davinci

Few-shot

https://beta.openai.com/playground/p/4Qsizf82t07oMVJZiZrg9KX M?model=davinci

Hate Speech Detection via GTP-3 Prompts: https://arxiv.org/pdf/2103.12407.pdf

197 of 221

HATECHECK: Functional Tests for Hate Speech

197

ACL’2021: HateCheck: Functional Tests for Hate Speech Detection Models.

Paul Röttger, Bertram Vidgen, Dong Nguyen, Zeerak Waseem, Helen Z. Margetts, Janet B. Pierrehumbert

198 of 221

HATECHECK: Functional Tests for Hate Speech

198

ACL’2021: HateCheck: Functional Tests for Hate Speech Detection Models.

Paul Röttger, Bertram Vidgen, Dong Nguyen, Zeerak Waseem, Helen Z. Margetts, Janet B. Pierrehumbert

199 of 221

199

Beyond Hate Speech:

Detecting Harmful Memes and Their Targets

ACL'2021 (Findings): Detecting Harmful Memes and Their Targets.

Shraman Pramanick, Dimitar Dimitrov, Rituparna Mukherjee, Shivam Sharma, Md. Shad Akhtar, Preslav Nakov, Tanmoy Chakraborty

199

  • Fake news
  • Hate speech
  • Attacks, sarcasm

200 of 221

Detecting Harmful Memes

and Their Targets

200

Problem 1 (Harmful meme detection):

      • very harmful
      • partially harmful
      • harmless

Problem 2 (Target identification of harmful memes):

  • individual
  • organization
  • community/country
  • society/general public/others

HarMeme: 3,544 memes related to COVID-19.

Beyond Hate Speech:

Detecting Harmful Memes and Their Targets

ACL'2021 (Findings): Detecting Harmful Memes and Their Targets.

Shraman Pramanick, Dimitar Dimitrov, Rituparna Mukherjee, Shivam Sharma, Md. Shad Akhtar, Preslav Nakov, Tanmoy Chakraborty

200

201 of 221

Beyond Hate Speech:

Detecting Harmful Memes and Their Targets

EMNLP'2021 (Findings): MOMENTA: A Multimodal Framework for Detecting Harmful Memes and Their Targets

Shraman Pramanick, Shivam Sharma, Dimitar Dimitrov, Md. Shad Akhtar, Preslav Nakov, Tanmoy Chakraborty

201

202 of 221

Beyond Hate Speech: Propaganda

  • “Expression deliberately designed to influence the opinions/actions of other individuals or groups with reference to predetermined ends.”

Institute for Propaganda Analysis

202

IJCAI-2020: A Survey on Computational Propaganda Detection.

Giovanni Da San Martino, Stefano Cresci, Alberto Barrón-Cedeño, Seunghak Yu, Roberto Di Pietro, Preslav Nakov

203 of 221

Beyond Hate Speech:

Fine-Grained Propaganda Detection

Dataset

  • 18 techniques
  • 350k words
  • 400 man hours
  • 7.3k instances

203

EMNLP-2019: Fine-Grained Analysis of Propaganda in News Articles.

Giovanni Da San Martino, Seunghak Yu, Alberto Barrón-Cedeño, Rostislav Petrov, Preslav Nakov

204 of 221

Beyond Hate Speech:

Fine-Grained Propaganda Detection

204

EMNLP-2019: Fine-Grained Analysis of Propaganda in News Articles.

Giovanni Da San Martino, Seunghak Yu, Alberto Barrón-Cedeño, Rostislav Petrov, Preslav Nakov

205 of 221

205

ACL-2020 (best demo award, honorable mention): Prta: A System to Support the Analysis of Propaganda Techniques in the News.

Giovanni Da San Martino, Shaden Shaar, Yifan Zhang, Seunghak Yu, Alberto Barrón-Cedeño, Preslav Nakov

206 of 221

206

Beyond Hate Speech:

Shared Tasks on Propaganda Techniques

207 of 221

Techniques in text:

Techniques in text+image:

{

"id": "125",

"labels": [

"Loaded Language",

"Name calling/Labeling"

],

"text": "I HATE TRUMP\n\nMOST TERRORIST DO"

}

{

"id": "125",

"labels": [

"Reductio ad hitlerum",

"Smears",

"Loaded Language",

"Name calling/Labeling"

],

"text": "I HATE TRUMP\n\nMOST TERRORIST DO",

"image": "125_image.png"

}

Beyond Hate Speech:

SemEval-2021: Propaganda Techniques in Memes

207

SemEval-2021: SemEval-2021 Task 6: Detection of Persuasive Techniques in Texts and Images

Dimitar Dimitrov, Bishr Bin Ali, Shaden Shaar, Firoj Alam, Fabrizio Silvestri, Hamed Firooz, Preslav Nakov and Giovanni Da San Martino

208 of 221

Beyond Hate Speech:

Propaganda Detection in Memes

208

Appeal to Fear; Black & White Fallacy

Whataboutism

ACL-2021 (Findings): Detecting Propaganda Techniques in Memes.

Dimitar Dimitrov, Bishr Bin Ali, Shaden Shaar, Firoj Alam, Fabrizio Silvestri, Hamed Firooz, Preslav Nakov and Giovanni Da San Martino

209 of 221

Beyond Hate Speech: Policies of Big Tech

209

ArXiv 2021: Detecting Abusive Language on Online Platforms: A Critical Analysis.

Preslav Nakov, Vibha Nayak, Kyle Dent, Ameya Bhatawdekar, Sheikh Muhammad Sarwar, Momchil Hardalov, Yoan Dinkov, Dimitrina Zlatkova, Guillaume Bouchard, Isabelle Augenstein

210 of 221

Beyond Hate Speech:

Policies of Product-Specific Platforms

210

210

ArXiv 2021: Detecting Abusive Language on Online Platforms: A Critical Analysis.

Preslav Nakov, Vibha Nayak, Kyle Dent, Ameya Bhatawdekar, Sheikh Muhammad Sarwar, Momchil Hardalov, Yoan Dinkov, Dimitrina Zlatkova, Guillaume Bouchard, Isabelle Augenstein

211 of 221

Zero-Shot Classification

  • Fine-tune an existing transformer model.
  • Experimenting with various classification heads like FNN, CNN-Pooling, BiLSTM etc.

Cross-lingual Zero- and Few-shot Hate Speech Detection utilising frozen Transformer Language Models and AXEL: https://arxiv.org/pdf/2004.13850.pdf

212 of 221

Zero-Shot Classification via BERT

  • Models were further trained on hateful text however, they did not improvement over simple fine-tuned models.
  • This gap in F1-scores is unexpected as the intention of further training the language models with domain-specific data was to increase the hateful language understanding.
  • Similar results obtained for a large dataset like Founta.

Using Transfer-based Language Models to Detect Hateful and Offensive Language Online: https://aclanthology.org/2020.alw-1.3/

213 of 221

HateBERT: Retraining BERT for Abusive Language Detection in English

  • Obtain unlabelled samples of potentially harmful content from Banned or Controversial Reddit Communities. (Curated 1M+ messages)
  • Re-trained BERT base for Masked Language Modeling Task

Fine-tuned results comparison

Fine-tuned results comparison (cross- dataset training and testing)

HateBERT: Retraining BERT for Abusive Language Detection in English: https://arxiv.org/abs/2010.12472

214 of 221

Hate Speech Detection via GTP-3 Prompts

  • LM are known to return toxic responses, especially when generating content for vulnerable entity.
  • Can they be used to detect hateful content as well?

Hate Speech Detection via GTP-3 Prompts: https://arxiv.org/pdf/2103.12407.pdf

215 of 221

Cross lingual Hate Speech Detection

  • When a dataset is trained purely on a specific language and tested on the same, the F1 score for hate detection in in the range of 0. 72-0.74.
  • When the datasets are merged to give a combined domain datasets training on samples containing both english & dutch, then testing performance on pure english and pure dutch test set drops to 0.60.

Exposing the limits of Zero-shot Cross-lingual Hate Speech Detection:https://aclanthology.org/2021.acl-short.114/

216 of 221

Cross lingual Hate Speech Detection

  • Languages covered in training and testing: English, Italian, Spanish. Used existing HateEval datasets.
  • Make use of multilingual transformers mBERT, XML-R.
  • The high score by the overfitted hashtag, overshadows the positive influence of the non-hateful terms, causing the overall prediction to be hateful.

Exposing the limits of Zero-shot Cross-lingual Hate Speech Detection:https://aclanthology.org/2021.acl-short.114/

217 of 221

Limitations

  • Producing large scale annotated dataset for fine-grained targets is not easy.
  • mBERT, XML-R are not able to capture language specific taboos, leading to higher false positive for zero-shot cross-lingual.
  • They do not transfer uniformly to different hate speech target and types.

Exposing the limits of Zero-shot Cross-lingual Hate Speech Detection:https://aclanthology.org/2021.acl-short.114/

218 of 221

Concluding Remarks

219 of 221

Key Takeaways

  • Datasets used for hate speech:
    • There is a diversity of data labels, with limited overlap/uniformity
    • Skewed in favour of English textual content.
  • Methods used for hate speech detection:
    • A vast array of techniques from classical ML to prompt based zero-shot learning have been tested.
    • Out-of-domain performance is abysmal for most cases.
    • Need to move towards lifelong learning, dynamic catchphrase detection methods.
    • Study of impact of offline hate instances from online hate.
  • Methods used for hate speech diffusion:
    • Very little work in predictive modeling of spread of hate. API bottleneck for curation of large scale studies.
    • Not all platforms support publically available follower network, how to manage diffusion in such scenarios?
  • Psychological traits of hate speech spreaders
  • Hate speech intervention:
    • Improvements in NLG will help in downstream tasks like hate speech.
    • Hate speech NLG heavily depends on the context (geographical, cultural, temporal etc) how can be incorporate that knowledge in an evolving manner.
    • Early detection and prevention within network an active area of research.
  • Bias in hate speech:
    • How to reduce annotation bias in the first place?
    • Do biases transfer across domain?

220 of 221

Future Scope

  • How to combine detection and diffusion?
  • More work on low-resource languages needed
  • Knowledge-aware hate speech detection
  • Better intervention strategies
  • Handling false negatives (implicit hate)
  • Multimodal hate speech
  • How psychological traits help predict the hate speech diffusion?
  • Language-agnostic and topic-agnostic hate speech
  • Model sensitivity analysis
  • Explainable hate speech classifier
  • Multilingual and cross-lingual hate speech
  • From harmful content to hateful? [2]

[1] Fig 1:

221 of 221

Thanks Q&A