1 of 12

The Future of AI in Online Safety

Roy Ka-Wei Lee, Assistant Professor

Singapore University of Technology and Design (SUTD)

TikTok

13 Dec 2024

2 of 12

WARNING: The following talk contain act of violence and discrimination that may be disturbing to some participants. Discretion is advised

3 of 12

Wide Spectrum of Online Harm

Online Harm

hate speeches

Deepfakes

Fake News

Cyberbullying

Sexual Harassment

Dangerous viral challenges

False Rumors

Many other harms…

4 of 12

What is Hate Speech?

United Nations defines hate speech as…“any kind of communication in speech, writing or behaviour, that attacks or uses pejorative or discriminatory language with reference to a person or a group on the basis of who they are, in other words, based on their religion, ethnicity, nationality, race, colour, descent, gender or other identity factor.”

source: https://www.un.org/en/hate-speech/understanding-hate-speech/what-is-hate-speech

5 of 12

Hate Speech Works from Social AI Studio

Hateful content detection algorithms developed by Social AI Studio

DeepHate - Deep learning model that uses multi-facet information (semantics, sentiment, topics) for hate speech detection.

AngryBERT - Multi-task learning framework that enables joint learning of target and emotion for hate speech detection.

HateGAN - Generative adversarial network that generates hateful social media posts for data augmentation.

HEAR - Deep recursive network to perform early hate speech propagation prediction.

DisMultiHate - Disentangle target entities in hateful memes to improve classification and explainability.

Explainable Hateful Meme - Perform visual-text slur grounding to understand hateful memes.

6 of 12

SGHateCheck

SGHateCheck is a novel framework for detecting hate speech in Singapore’s main languages.

It builds on the existing HateCheck framework
Covers various hate speech scenarios, such as slurs, derogatory remarks, and threats

Goal: Create a functional hate speech test dataset �that captures Singapore’s linguistic diversity—�English, Mandarin, Malay, and Tamil

Over 21,000 test cases with balanced hateful �and non-hateful content.

SGHateCheck: Functional Tests for Detecting Hate Speech in Low-Resource Languages of Singapore [WOAH’24]

7 of 12

Benchmarking Models on SGHateCheck

Evaluate LLaMA-2(LL),mBert (MB),Mistral (MI),SEA-LION (SO), SeaLLM (SM)
Mistral and SeaLLM performs relatively well in Singlish, Malay and Mandarin hate speech detection
SEA-LION performs poorly on all Singapore-based hate speech
All models performed badly on Tamil

8 of 12

Hateful Meme Detection

Hateful memes - target certain communities and or individuals by portraying them in a derogatory manner
Challenging research problem:

Hateful context from multiple modality.
Lack contextual information (e.g., target in hateful memes)

Demystifying Hateful Content: Leveraging Large Multimodal Models for Hateful Meme Detection with Explainable Decisions [ICWSM’24]

9 of 12

MultiHateClip

A multilingual dataset of short hateful YouTube and Bilibili videos with detailed annotations, including offensive segments, targeted victims, and contributing modalities

MultiHateClip: A Multilingual Benchmark Dataset for Hateful Video Detection on YouTube and Bilibili [MM’24]

	Hateful	Offensive	Normal	Total
YouTube	128	194	678	1000
Bilibili	82	256	622	1000

Source: https://www.youtube.com/watch?v=TGz_zGLrmPs

10 of 12

Open Issues and Research Opportunities

Implicit (latent) hate speech and evolving lexicon

Some things are more subtle and new way of insulting others…

Multilingual and multicultural dataset

Detecting hate speech in low resource languages

Going beyond text (Multimodality)

Detecting hateful multimedia content (memes, videos, live-streaming etc.)

Improving annotation framework

Handle cultural context and bias in crowdsource annotation
Unified hate speech taxonomy?

Human bias vs model biases

Detecting and reducing model biases from training data

Go beyond detection

Think about prevention and intervention

11 of 12

Collaborate to do good

Governments - Enact clear and comprehensive online safety laws and share best practice with ASEAN members
Platforms - Invest in advanced AI-driven content moderation systems tailored to regional nuances
NGOs - Launch widespread campaigns to raise awareness about the impact of online hate speech.
Academics - Conduct in-depth studies to understand the evolving nature of online hate speech and its societal implications, and collaborate with other stakeholders to provide data-driven insights and recommendations for effective interventions.

Governments

Platforms

NGOs

Academics

Online

Safety & Trust

12 of 12

Thank You

Roy Ka-Wei Lee | Assistant Professor

Singapore University of Technology and Design

roy_lee@sutd.edu.sg

User Profiling in Multiple Social Media

Online Safety & Cyber Abuse Research

https://www.socialai.studio/

Social Natural Language Generation

Social Recommender System