1 of 12

The Future of AI in Online Safety

Roy Ka-Wei Lee, Assistant Professor

Singapore University of Technology and Design (SUTD)

TikTok

13 Dec 2024

2 of 12

2

WARNING: The following talk contain act of violence and discrimination that may be disturbing to some participants. Discretion is advised

3 of 12

Wide Spectrum of Online Harm

3

Online Harm

hate speeches

Deepfakes

Fake News

Cyberbullying

Sexual Harassment

Dangerous viral challenges

False Rumors

Many other harms…

4 of 12

What is Hate Speech?

United Nations defines hate speech as…“any kind of communication in speech, writing or behaviour, that attacks or uses pejorative or discriminatory language with reference to a person or a group on the basis of who they are, in other words, based on their religion, ethnicity, nationality, race, colour, descent, gender or other identity factor.

4

5 of 12

Hate Speech Works from Social AI Studio

  • Hateful content detection algorithms developed by Social AI Studio

5

DeepHate - Deep learning model that uses multi-facet information (semantics, sentiment, topics) for hate speech detection.

AngryBERT - Multi-task learning framework that enables joint learning of target and emotion for hate speech detection.

HateGAN - Generative adversarial network that generates hateful social media posts for data augmentation.

HEAR - Deep recursive network to perform early hate speech propagation prediction.

DisMultiHate - Disentangle target entities in hateful memes to improve classification and explainability.

Explainable Hateful Meme - Perform visual-text slur grounding to understand hateful memes.

6 of 12

SGHateCheck

  • SGHateCheck is a novel framework for detecting hate speech in Singapore’s main languages.
    • It builds on the existing HateCheck framework
    • Covers various hate speech scenarios, such as slurs, derogatory remarks, and threats
  • Goal: Create a functional hate speech test dataset �that captures Singapore’s linguistic diversity—�English, Mandarin, Malay, and Tamil
    • Over 21,000 test cases with balanced hateful �and non-hateful content.

6

SGHateCheck: Functional Tests for Detecting Hate Speech in Low-Resource Languages of Singapore [WOAH’24]

7 of 12

Benchmarking Models on SGHateCheck

  • Evaluate LLaMA-2(LL),mBert (MB),Mistral (MI),SEA-LION (SO), SeaLLM (SM)
  • Mistral and SeaLLM performs relatively well in Singlish, Malay and Mandarin hate speech detection
  • SEA-LION performs poorly on all Singapore-based hate speech
  • All models performed badly on Tamil

7

8 of 12

Hateful Meme Detection

  • Hateful memes - target certain communities and or individuals by portraying them in a derogatory manner
  • Challenging research problem:
    • Hateful context from multiple modality.
    • Lack contextual information (e.g., target in hateful memes)

8

Demystifying Hateful Content: Leveraging Large Multimodal Models for Hateful Meme Detection with Explainable Decisions [ICWSM’24]

9 of 12

MultiHateClip

  • A multilingual dataset of short hateful YouTube and Bilibili videos with detailed annotations, including offensive segments, targeted victims, and contributing modalities

9

MultiHateClip: A Multilingual Benchmark Dataset for Hateful Video Detection on YouTube and Bilibili [MM’24]

Hateful

Offensive

Normal

Total

YouTube

128

194

678

1000

Bilibili

82

256

622

1000

10 of 12

Open Issues and Research Opportunities

  • Implicit (latent) hate speech and evolving lexicon
    • Some things are more subtle and new way of insulting others…
  • Multilingual and multicultural dataset
    • Detecting hate speech in low resource languages
  • Going beyond text (Multimodality)
    • Detecting hateful multimedia content (memes, videos, live-streaming etc.)
  • Improving annotation framework
    • Handle cultural context and bias in crowdsource annotation
    • Unified hate speech taxonomy?
  • Human bias vs model biases
    • Detecting and reducing model biases from training data
  • Go beyond detection
    • Think about prevention and intervention

10

11 of 12

Collaborate to do good

11

  • Governments - Enact clear and comprehensive online safety laws and share best practice with ASEAN members
  • Platforms - Invest in advanced AI-driven content moderation systems tailored to regional nuances
  • NGOs - Launch widespread campaigns to raise awareness about the impact of online hate speech.
  • Academics - Conduct in-depth studies to understand the evolving nature of online hate speech and its societal implications, and collaborate with other stakeholders to provide data-driven insights and recommendations for effective interventions.

Governments

Platforms

NGOs

Academics

Online

Safety & Trust

12 of 12

Thank You

Roy Ka-Wei Lee | Assistant Professor

Singapore University of Technology and Design

roy_lee@sutd.edu.sg

User Profiling in Multiple Social Media

Online Safety & Cyber Abuse Research

Social Natural Language Generation

Social Recommender System