Localized Trust:
Bridging the Divide in Multilingual AI Safety
28 Aug 2025
Roy Ka-Wei Lee
Assistant Professor
Singapore University of Technology and Design
IMDA TSS
Happy Anniversary!
2
Recap of Past Two Years
>60 LLMs launched*
3
*according to GPT-5
Recap of Past Two Years - Research
4
Recap of Past Two Years - Research
�
5
Recap of Past Two Years - Research
6
Recap of Past Two Years - Research
7
Recap of Past Two Years - Research
8
We will continue today’s story from here
9
WARNING: The following talk contain act of violence and discrimination that may be disturbing to some participants. Discretion is advised
10
RabakBench: Scaling Human Annotations to Construct Localized Multilingual Safety Benchmarks for Low-Resource Languages*
Gabriel Chua, Leanne Tan, Ziyu Ge, Roy Ka-Wei Lee (NeurIPS’25 - under review)
*Subsequent Slides are “stolen” from Leanne’s LorongAI’s presentation :)
LLM Achieving SOTA on multiple benchmarks
11
What about localisation?
12
13
What About Safety?
14
15
What about localised safety?
16
RabakBench is a localised toxicity benchmark for Singlish/Chinese/Malay/Tamil (N=5.3k).
17
RabakBench is a localised toxicity benchmark for Singlish/Chinese/Malay/Tamil (N=5.3k).
18
Constructing RabakBench
19
Constructing RabakBench
20
Constructing RabakBench
21
RabakBench Data Statistics
22
Benchmarking Guardrails on RabakBench
23
Benchmarking Guardrails on RabakBench
24
What We Learn from RabakBench?
25
What We Learn from RabakBench?
26
So what happens when guardrail fails?
27
Toxicity Red-Teaming: Benchmarking LLM Safety in Singapore’s Low-Resource Languages
Yujia Hu, Ming Shan Hee, Preslav Nakov, Roy Ka-Wei Lee (EMNLP’25)
Red-Teaming the LLM
28
Guardrail
Harmful Message/ Instruction
Output?
LLM
SGToxicGuard
A multilingual red-teaming benchmark to evaluate LLM safety in Singapore’s four languages: Singlish, Chinese, Malay, Tamil.
29
Red-Teaming Task Overview
Three tasks for the red-teaming evaluation:
30
Constructing SGToxicGuard Dataset
31
Localized Content Examples:
Task 1 - Toxic Conversation Results
32
Task 2 - Toxic QA Results
General Observations:
33
Task 3 - Toxic Tweet Composition Results
Toxicity most severe in Singlish, Malay, Tamil, showing direct link between low-resource status and vulnerability.
34
What We Learn from SGToxicGuard?
35
Summary - Comparison
36
Summary - When to Use Which?
37
Thank You
Roy Ka-Wei Lee | Assistant Professor
Singapore University of Technology and Design