1 of 56

Beyond Hostility: Detecting Subtle and Overt Forms of Online Conflict with Multi-Objective Learning

Joemon M Jose, School of Computing Science

gair-lab.github.io/

2 of 56

Collaborators & Credits

  • Oliver Warke – PhD student
  • Dr Jan Breitsohl, School of Business Management
  • Capturing the Spectrum of Social Media Conflict: A Novel Multi-objective Classification Model
    • O Warke, JM Jose, J Breitsohl, J Wang - Proceedings of the ACM SIGIR ICTIR 2024
  • Beyond Hostility: Detecting Subtle and Overt Forms of Online Conflict with Multi-Objective Learning
    • Under Review, Information Retrieval Research Journal

3 of 56

Outline

  • What is conflict?
  • Conflict Datasets
  • State of the Art on Hate Detection
  • Models for conflict detection
  • Thematic Analysis
  • Conclusion

4 of 56

Negative behaviours on social media

  • Social media users communicate their,
    • opinions, views, and thoughts to a broad global audience.

  • Consequently, increasing negative behaviours and interactions
    • on social media platforms have been observed

5 of 56

Hate Example

  • Nike is an industry and reflects capitalism. Now please take your stupid moral issues and go somewhere else where people actually care, nobody asked for your incredibly acute revelation.
  • Exactly Sasha-well said. great to see people care about morals unlike some dunces on here
  • Lol you sound extra dumb. Are you going to sit here and tell me honestly with your dumb face, that Alien Soldier doesn't look like a good game? Are you people really that caught up on what's well known that you’d downplay a great looking game like that for Genesis Mortal Kombat?
  • @Nike You believe in this scam? Australian open banned the current #1 player to compete? No goat in my eyes if he didn't challenge against the #1 player.
  • Hahaaa... it's his choice NoVax. Get over it. Rafa won the 21st.

Toxic Behaviours

6 of 56

Consequences

    • A toxic online environment, with severe consequences to the well-being of individuals and communities.

    • Significant negative effects on users
      • Mental health, anxiety

  • Factors contribute to the escalation of these detrimental behaviours
    • the absence of tangible physical and social cues, anonymity, and
    • the absence of accountability for users
      • who propagate harmful content

Reginald Gonzales. Social media as a channel and its implications on cyber bullying. In DLSU Research Congress, pages 1–7, 2014.

7 of 56

Conflict (Different shades of Hate)

  • The definition of conflict often depends on the lens through which it is viewed, such as
    • psychology, sociology, political science, or law.

  • Conflict is broadly defined as a situation in which two or more parties perceive a divergence of interests, goals, or values that may lead to opposition, tension, or incompatibility.
    • It can arise in various contexts—interpersonal, organizational, societal, or international—and can involve individuals, groups, or nations.

8 of 56

Hate vs Conflict

9 of 56

Brand communities

  • Brand communities are pages, groups, and timelines controlled by brands which provide the opportunity for consumers to interact with the brand
  • Brands
    • Companies on the platform
    • Adidas
    • Nike
  • Platform
    • Twitter
    • Reddit
    • Facebook

10 of 56

Let us think of from a platform or brand perspective

  • Decreased Engagement: Users may reduce their activity or leave the platform entirely due to a hostile environment.

  • Loss of Credibility: A platform dominated by negativity risks losing its reputation as a space for productive discourse.

  • Erosion of Trust: Negative interactions erode trust in online platforms as reliable sources of information or community engagement.

  • Influence on Public Discourse: High-profile negative interactions can distort public discussions on important issues, focusing more on conflict than substance.

  • Increased Moderation Challenges: Platforms like Twitter must invest significant resources to monitor and curb negative behaviour, which can be resource-intensive.

11 of 56

Consequences of Negative Interaction - 1

  • From a business perspective we can consider brand perception
    • What is the brand stance on the negative interaction?
    • How does the negative interaction reflect on the brand current marketing strategies?

  • Is there an argument for an ethical responsibility for brands to act on negative interactions?

  • Responsibility is a debated point between brands and platforms
    • Who is responsible for the interactions on social media pages?
    • The platform which the page is hosted on or the brand running the page?

11

12 of 56

Consequences of Negative Interaction - 2

  • Negative interactions have negative impacts for a brand.

    • brands should take an active involvement in negative interactions.

    • brand intervention has both a positive effect for the brand and consumer.

13 of 56

Platform changes

  • Facebook is abandoning their stricter content policies in favour of more lax content controls
    • In January 2025, Mark Zuckerberg, the CEO of Facebook, announced that Facebook would be removing fact-checkers and moderators.
    • Zuckerberg accepted that there would be more misinformation and hate on the platform but that it would be a worthwhile trade-off
  • Major social media sites are increasingly delegating content oversight to users.
    • Zuckerberg stated that they were going to reply on users reporting content violations before they took action
  • X (Twitter) has also removed much of its content moderation team, which has also caused concern about the increase in harmful content on the platform

14 of 56

Conflict Datasets

15 of 56

Need for fine grained conflict dataset

  • Existing datasets predominantly concentrate on overtly
    • antagonistic manifestations of adverse interactions,
    • thus disregarding the complete spectrum of conflict. 

  •  Hence, there is a need
    • to develop datasets that reflect this broader range.

Davidson et al

Offensive language

Hate speech classes

Normal

Fortuna et al

Abusive

Hateful

Normal

16 of 56

Brand Communities

  • Brand communities are pages, groups, and timelines controlled by brands
    • which provide the opportunity for consumers to interact with the brand and each other
  • Our collaborators used non-participatory netnographic approach employed involved observing online social interactions.
  • Throughout this investigative period, the researchers meticulously examined and classified many consumer conflicts, aiming to comprehend the diverse manifestations of such conflicts.

Jan Breitsohl, Holger Roschk, and Christina Feyertag. Consumer brand bullying behaviour in online communities of service firms. Service Business Development: Band 2., pages 289–312, 2018.

17 of 56

Broad spectrum of conflicts

  • Social Science collaborators adopted a dual-coding methodology to uphold the integrity of the annotation procedure,
    • involving the active participation of two social science researchers.
  • The initial phase encompassed the first researcher’s deductive identification of incidents involving consumer conflicts.
  • An independent analysis of the data was then undertaken by a second researcher.
  • Both researchers subsequently engaged in a comprehensive assessment of the reliability and applicability of their respective analyses.
  • This deliberation was accompanied by extensive discussions to resolve any divergence in their interpretations.

18 of 56

Broad spectrum of negative behaviours

  • This rigorous and meticulous process culminated in identifying and classifying six distinct categories of conflicts,
    • Teasing’, ’Criticism’, ’Sarcasm’, ’Trolling’, ’Harassment’, and ’Threats’.
  • These categories span a spectrum of conflict severity and serve as the label space for all models evaluated in this study.
  • Teasing, Criticism, and Sarcasm were all selected as less hostile forms of conflict
  • whilst trolling, harassment, and threats were denoted as the more hostile forms of conflict.

19 of 56

Rationale

  • Harm is almost exclusively the motive of those conducting trolling, harassment, and threats.
    • These three classes have a clear motivation to cause severe distress to the recipient and therefore are definitively placed within the extreme forms of conflict.
  • Sarcasm and teasing can both be considered lighter forms of conflict as there is a significant presence of humor in both,
    • although they still have the potential to cause harm
  • Sarcasm and teasing are two separate entities,
    • though similarity exist to a certain extent
  • Criticism borders the line between light and severe conflict;
    • although it can be delivered in a constructive manner, it can also be received as insult.
  • Criticism was seen as a socially acceptable form of interaction which,
    • although not always the priority of the perpetrator, scan result in harm to the recipient

20 of 56

Datasets

Lesser Conflicts

Extreme Conflicts

Class

Datapoints

Class

Datapoints

Teasing

208

Trolling

1089

Criticism

698

Harassment

1098

Sarcasm

577

Threats

482

21 of 56

Dataset Characteristics

  • Average interclass similarity is 0.32
  • there is less distinction between others, such as ’Criticism’ and ’Harassment’,
    • which had a high similarity score of 0.71.

Class

Chars

Words

%stop words

No. of sentences

Teasing

78.5

14.6

0.31

1.6

Criticism

232.9

42.7

0.41

2.9

Sarcasm

58.9

10.7

0.32

1.1

Trolling

105.4

18.6

0.32

1.6

Harassment

130.2

23.9

0.35

2.2

Threats

275.7

50.6

0.31

5.5

Average

147

26.85

0.34

2.48

22 of 56

Related Work

23 of 56

Prior work - Deep learning algorithms

  • CNN have demonstrated impressive capabilities across different text classification tasks.
  • HateClassify, a CNN-based framework, for labelling social media content, achieving competitive multiclass accuracy

Muhammad Khan, Assad Abbas, Attiqa Rehman, and Raheel Nawaz. Hateclassify: A service framework for hate speech identification on social media. IEEE Internet Computing, 25(1):40–49, 2020.

24 of 56

Hate

  • Detecting hate across multiple platforms and found models with BERT features superior.

Joni Salminen, Maximilian Hopf, Shammur A Chowdhury, Soon-gyo Jung, Hind Almerekhi, and Bernard J Jansen. Developing an online hate classifier for multiple social media platforms. Human-centric Computing and Information Sciences, 10:1–34, 2020.

  • HateBERT, a retrained BERT model that outperformed the base BERT model in detecting abusive language. Intensive pre-training on social media comments before being deployed for fine-tuning domain-specific tasks.

Tommaso Caselli, Valerio Basile, Jelena Mitrovi´c, and Michael Granitzer. Hatebert: Retraining bert for abusive language detection in english. arXiv preprint arXiv:2010.12472,2020.

  • Overall, existing detection methods tend to focus on overtly hostile behaviours,
    • such as threats and harassment, while overlooking subtler forms like teasing, sarcasm, and criticism.

25 of 56

Datasets

Dataset

Fortuna et al

Abusive

Hateful

Normal

Davidson et al

Offensive language

Hate speech classes

Normal

26 of 56

Models used

  • BERT
    • BERT-Base uncased pre-trained model with 12 layers, 12 heads, 768 hidden size, and 110M parameters.
  • GPT-2
    • model with 12 layers, 12 heads, 768 hidden size, and 117M parameters.
  • Flan-T5
    • A state-of-the-art generative language model which can be fine tuned for text classification. Flan-T5 base model with 248M parameters uploaded to the Huggingface repository by the Google team.

27 of 56

Models used

  • HateBERT
    • the default model provided by the authors with 12 layers, 12 heads, 768 hidden size, and 110M parameters.

  • DistilBERT
    • a lightweight variation of the base BERT model, has been proven to be an excellent competitor to the traditional BERT model
    • reduce the size of a BERT model by 40%, while retaining 97% of its language understanding capabilities and being 60% faster

28 of 56

Performance

Models

Founta et al

Davidson et al

Conflict Dataset

Acc

F1

R

P

Acc

F1

R

P

Acc

F1

R

P

BERT

0.77

0.76

0.77

0.78

0.87

0.86

0.86

0.86

0.69

0.61

0.61

0.65

Hate

BERT

0.78

0.73

0.73

0.73

0.86

0.86

0.86

0.87

0.55

0.52

0.52

0.52

Distill

BERT

0.77

0.69

0.67

0.70

0.85

0.86

0.86

0.86

0.55

0.53

0.52

0.54

GPT-2

0.77

0.72

0.73

0.72

0.90

0.73

0.72

0.73

0.59

0.51

0.51

0.55

Flan-T5

0.77

0.76

0.76

0.77

0.87

0.87

0.86

0.88

0.67

0.60

0.61

0.61

29 of 56

Dataset challenges

  • A substantial challenge arises due to the inherent interaction between distinct classes,
    • leading to heightened complexities in distinguishing and isolating specific class instances.
  • Within multi-class scenarios, the occurrence of cross-talk between classes introduces a significant hurdle, consequently impacting the effectiveness of classification
  • The conventional strategy of constructing test collections from social media,
    • especially exploiting distant supervision, contributes to this situation's intricacies.
    • Exacerbates the challenge due to the ambiguity, noisy and error-prone constitution of social media data

30 of 56

Decision Transformer

  • The Decision Transformer model introduces a pioneering strategy for
    • transforming reinforcement learning problems into sequential decision-making problems by utilising the Transformer architect

  • An innovative approach by exploiting the reward functionality aspect of the Decision Transformer framework,
    • introducing a novel class-based reward computation mechanism.

  • The focal point of this reward function is to optimize class distances with the overarching goal of enhancing classification performance.

31 of 56

Reward modelling

  • Given a text to classify, we will find distance between the classes and the text
  • Take average of each class embedding and find cosine distances between text embeddings
  • Distance values are scaled to 1 to 100
    • Due to issues of models like BERT in handling decimals
  • Reward function encourages correct classification

32 of 56

Decision Transformer

33 of 56

Performance

Models

Founta et al

Davidson et al

Conflict Dataset

Acc

F1

R

P

Acc

F1

R

P

Acc

F1

R

P

BERT

0.77

0.76

0.77

0.78

0.87

0.86

0.86

0.86

0.69

0.61

0.61

0.65

Hate

BERT

0.78

0.73

0.73

0.73

0.86

0.86

0.86

0.87

0.55

0.52

0.52

0.52

Distill

BERT

0.77

0.69

0.67

0.70

0.85

0.86

0.86

0.86

0.55

0.53

0.52

0.54

GPT-2

0.77

0.72

0.73

0.72

0.90

0.73

0.72

0.73

0.59

0.51

0.51

0.55

Flan-T5

0.77

0.76

0.76

0.77

0.87

0.87

0.86

0.88

0.67

0.60

0.61

0.61

ConflictDT

0.77

0.77

0.78

0.78

0.89

0.88

0.88

0.88

0.71

0.63

0.64

0.62

34 of 56

BERT and ConflictDT class performance - conflict dataset

35 of 56

Reward variations

  • Distance between the text embedding and all classes
    • This reward function, ’distance between all classes’, aimed to prioritize separating all classes within the dataset and could be applied to any classification task.
  • Distance between the text embedding and lesser and more extreme forms of conflict
    • to exploit common characteristics within these two groups, as identified during dataset analysis.
  • Distance between the ’Harassment’ class and the text embedding
    • Harassment’ class datapoints were frequently misclassified into other classes and vice versa
  • Distance between the text embedding and all classes with sequential functionality

36 of 56

Performance

Models

Accuracy

F-1

R

P

BERT

0.69

0.61

0.61

0.65

HierBERT

0.69

0.62

0.63

0.63

GPT-2

0.59

0.51

0.51

0.55

Flan-T5

0.67

0.60

0.61

0.61

Dist all classes - ConflictDT

0.71

0.63

0.64

0.62

Dist Lessr and Gtr Classes

0.67

0.59

0.62

0.75

Dist Harrassment

0.68

0.61

0.63

0.67

37 of 56

ConflictDT class performance with Harassment and Lesser-Greater rewards -�the conflict dataset.�

38 of 56

Sequential conflictDT

  • Subtle human conflict behaviours
  • often depend on emphasis being placed on words or sentences occuring within a sequence.

39 of 56

The effects of reward functions

Models

Accuracy

F-1

R

P

BERT

0.69

0.61

0.61

0.65

HierBERT

0.69

0.62

0.63

0.63

GPT-2

0.59

0.51

0.51

0.55

Flan-T5

0.67

0.60

0.61

0.61

Dist all classes - ConflictDT

0.71

0.63

0.64

0.62

Dist Lessr and Gtr Classes

0.67

0.59

0.62

0.75

Dist Harrassment

0.68

0.61

0.63

0.67

Sequential no reward

0.68

0.60

0.60

0.74

Sequential with reward

0.69

0.62

0.62

0.64

40 of 56

Graph showing the distance between the logits of the predicted class and the�logits of the true class changing over timesteps

41 of 56

Knowledge Distillation

42 of 56

Knowledge Distillation (KD)

  • Transferring the ”dark knowledge” from a large, high-performance ”teacher” model to a smaller, more efficient ”student” model.

Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015.

  • demonstrated that by training the student to mimic the softened output probabilities (often referred to as ”soft targets”) of the teacher,
  • the student could achieve significantly improved performance compared to the baseline performance of the student model. 

43 of 56

KD - modelling

  • Each training instance contributes two learning signals to the student model:
    • Hard label: The ground truth class provided by human annotators.
    • Soft supervision: Information extracted from the teacher via prompting.
  • where the hard labels ensure alignment to the ground truth annotations
  • whilst the soft supervision provides additional information from the teacher which is otherwise not available to the student.

44 of 56

Soft Target Distillation

  • Leveraging softened output distributions from the teacher to guide the student.

45 of 56

Feature Distillation using Probabilities

  • Introducing a regression head to align the student’s probability vectors with those of the teacher. 
  • MSE treats all errors equally, encouraging
  • the student to match every probability with high fidelity

46 of 56

Feature Distillation using Embeddings

  • Aligning the student’s internal hidden states with those of the teacher model

47 of 56

Models

Founta et al

Davidson et al

Conflict Dataset

Acc

F1

R

P

Acc

F1

R

P

Acc

F1

R

P

BERT

0.77

0.76

0.77

0.78

0.87

0.86

0.86

0.86

0.69

0.61

0.61

0.65

Hate

BERT

0.78

0.73

0.73

0.73

0.86

0.86

0.86

0.87

0.55

0.52

0.52

0.52

Distill

BERT

0.77

0.69

0.67

0.70

0.85

0.86

0.86

0.86

0.55

0.53

0.52

0.54

GPT-2

0.77

0.72

0.73

0.72

0.90

0.73

0.72

0.73

0.59

0.51

0.51

0.55

Flan-T5

0.77

0.76

0.76

0.77

0.87

0.87

0.86

0.88

0.67

0.60

0.61

0.61

ConflictDT

0.77

0.77

0.78

0.78

0.89

0.88

0.88

0.88

0.71

0.63

0.64

0.62

KDemb

0.83

0.83

0.83

0.84

0.91

0.90

0.90

0.90

0.72

0.67

0.67

0.67

KDlp

0.78

0.77

0.77

0.78

0.86

0.85

0.85

0.85

0.71

0.61

0.63

0.62

KDlcl

0.74

0.73

0.74

0.73

0.86

0.85

0.85

0.85

0.70

0.60

0.61

0.61

48 of 56

Performance-Cost Analysis

  • We record the time taken to execute training loops for each model and use this as a proxy for computational expense.
  • we apply the Green Algorithm’s carbon cost estimator introduced by

Loıc Lannelongue, Jason Grealey, and Michael Inouye. Green algorithms: quantifying the carbon footprint of computation. Advanced science, 8(12):2100707, 2021.

  • estimates the carbon footprint of any computational task by taking into account GPU type and count, memory usage, runtime, and platform.
  • we present an accounting of the environmental cost of training classification models, enabling comparison not only in terms of accuracy but also ecological impact.

49 of 56

Performance cost

  • ConflictDT increases F1-score by 2% with only a 4.95% increase in runtime,
  • while KD Hidden Embeddings improves F1-score by 6% but increases runtime by 46.4%.
  • the BERT classification model offers the best value per performance.

Model

Runtime (sec)

Carbon Footprint

G Co2e

Energy Consumption (Wh)

ConflictDT

799.60

2.19

23.01

BERT

761.92

2.06

21.65

Sequential

2384.98

6.46

67.76

LLM-Prob

8451.17

22.96

241.16

LLM-Hiddene

1115.31

3.03

31.84

LLM-Soft

8445.70

72.95

240.99

50 of 56

Thematic Analysis

  • This experiment conducts a qualitative analysis of ConflictDT classifier outputs on the six-class conflict dataset, focusing on patterns of misclassification.

Virginia Braun and Victoria Clarke. Using thematic analysis in psychology. Qualitative research in psychology, 3(2):77–101, 2006.

  • Thematic analysis enables us to identify recurring themes and behavioural patterns within misclassified online comments.

51 of 56

Six phase framework

  • Familiarization with the data,
  • Generation of initial codes,
  • Searching for themes,
  • Reviewing themes,
  • Defining and naming themes,
  • and
  • Producing the report.
  • Themes help in understanding the performance of a classification model.
    • For example: If a model misclassifies examples consistently, thematic analysis can identify whether specific patterns in the data (e.g., sarcasm in sentiment analysis) are causing errors.

52 of 56

Thematic Analysis

Datapoint

Model Label

Label

"Oh, great, another selfie of you and your cat. Because the internet desperately needed that. 😴"

Sarcasm

Sarcasm

"I think it's brilliant that you hate having to rush to pack things! I'm happy to pack at my own speed too."

Criticism

Teasing

"Your opinions are worthless and don't deserve any respect."

Harassment

Criticism

"You must be so proud of yourself for coming up with that gem 😂"

Trolling

Harassment

"Wow, you must be a genius to know everything about cats. 😹"

Sarcasm

Sarcasm

"So thats where I left my mop head! 😂"

Trolling

Teasing

"Her outfit is so basic, it's like she raided her grandma's closet."

Trolling

Trolling

53 of 56

Thematic Analysis

  • Identify themes and patterns as to why the datapoints were mis-annotated

Datapoint

Model

Label

Label

Thematic Code

"I think it's brilliant that you hate having to rush to pack things! I'm happy to pack at my own speed too."

Criticism

Teasing

Hard to see any criticism, text is more lighthearted

"Your opinions are worthless and don't deserve any respect."

Harassment

Criticism

Direct criticism of user

"You must be so proud of yourself for coming up with that gem 😂"

Trolling

Harassment

Insult lead to harassment coding

"So thats where I left my mop head! 😂"

Trolling

Teasing

Lighthearted joking, not looking to incite a response or cause hurt

54 of 56

Thematic Analysis - Results

55 of 56

Conclusion

  • We have introduced a conflict dataset
    • Looked into its characteristics
  • We have introduced a Decision Transformer based Conflict detection model
    • Knowledge Distillation methods
  • We looked into the reasons for such misclassification
    • Linguistic fluidity
    • Context dependency and
    • Humour ambiguity

56 of 56

For any questions or queries contact either:��Joemon M Josejoemon.jose@glasgow.ac.uk��Oliver Warkeoliver.warke.1@research.gla.ac.uk

56

#UofGWorldChangers

@UofGlasgow