1 of 4

Auto-translation vs. Post-edited

Capability Assessment

Can model understand and identify dialects?

Datasets: Belebele, QADI, ADI

Dialect Identification

Cultural Value

Can we rely on translation?

……

World Knowledge

Common sense Reasoning

Reading Comprehension

Misinformation

Datasets: …

2 of 4

Cognitive Abilities

World Knowledge

Common Sense Reasoning

Reading Comprehension

Misinformation

AraDiCE Datasets

ArabicMMLU

PIQA, OBQA, Winogrande

BoolQ, Belebele

TruthfulQA

Language and Dialects

MSA

LEV

EGY

Arabic culture

Understanding and Generation

Datasets

Dialect Identification

Generation

QADI, ADI, ADD

MADAR, DA Generation

Cultural Understanding

Capabilities and datasets

GULF

3 of 4

Cognitive Abilities

World Knowledge

Common Sense Reasoning

Reading Comprehension

Misinformation

AraDiCE Datasets

ArabicMMLU

PIQA, OBQA, Winogrande

BoolQ, Belebele

TruthfulQA

Language and Dialects

MSA

LEV

EGY

Regional Arabic culture

Dialect Understanding and Generation

Datasets

Dialect Identification

Generation

QADI, ADI, ADD

MADAR, DA Generation

Cultural Understanding

Capabilities and datasets

GLF

AraDiCE-Culture

4 of 4

Evaluation

4

Grounded Situations (HellaSwag)

NLU (Standard NLP Tasks)

Capabilities/Tasks/Datasets

World Knowledge

Common Sense Reasoning, Morality

Reading Comprehension

Misinformation, Factuality and Bias

Summarization

MSA, Dialects: Levantine, Gulf, North-African, Egyptian (covers ~18 dialects)

Sarcasm

Offensive

Dialect Identification

Natural Language Inference

Factuality (TruthfulQA, AraFact)

Stereotype

Bias

MSA

Mesopotamian

North Levantine

Najdi

Moroccan

Egyptian

MMLU

(57 subcategories/ subjects)

ArabicMMLU

(40 subcategories)

Belebele dataset

MSA and dialects

Exams

Ethics (Morality): Justice, well being, …

Information Seeking

(SituatedQA)

Physical common sense (PIQA)

Elementary school science facts (OpenBookQA)

Grad-school science questions

Natural science questions

Question Answering