ParsiNLU: A Suite of Language Understanding Challenges for Persian
Daniel Khashabi, Arman Cohan, Siamak Shakeri, Pedram Hosseini, Pouya Pezeshkpour, Malihe Alikhani, Moin Aminnaseri, Marzieh Bitaab, Faeze Brahman, Sarik Ghazarian, Mozhdeh Gheini, Arman Kabiri, Rabeeh Karimi Mahabadi, Omid Memarrast, Ahmadreza Mosallanezhad, Erfan Noury, Shahab Raji, Mohammad Sadegh Rasooli, Sepideh Sadeghi, Erfan Sadeqi Azer, Niloofar Safi Samghabadi, Mahsa Shafaei, Saber Sheybani, Ali Tazarv, Yadollah Yaghoobzadeh
1
Motivation
2
ParsiNLU: Overview
3
6 tasks
Manual Annotations
External Resources
Experiments w/ SOTA language models
Task 1: Reading Comprehension (1)
Setup: This task is defined as extracting a substring from a given context paragraph that answers a given question.
4
سوال: نهاوند جزو کدام استان است؟
Question: Nahavand is part of which province?�
پاراگراف: نَهاوند شهری در غرب ایران است. این شهر در جنوب غربی استان همدان قرار گرفته است. نهاوند دارای حمعیت …
Paragraph: Nahavand (Navan) is a city in western Iran. This city is located in the southern part of Hamedan province and it is the capital of Nahavand. Nahavand has a population of …
پاسخ: همدان، استان همدان
Answer: Hamedan; Hamedan province
Task 1: Reading Comprehension (2)
Overview of the data collection pipeline
5
Short questions
open-ended questions with no concrete answers
Questions
Select minimal and coherent spans
that contains the answer
correct grammatical
errors and typos
Question, Answer, Paragraph
Google’s Auto-complete
- a seed set of question terms: Who, Where, …
- repeatedly querying parts
- popular questions of Persian-speaking
users of Google
Task 2: Multiple-Choice QA (1)
6
بزرگترین قارهی جهان کدام است؟
✔ ۱) آسیا ۲) اروپا ۳) آمریکا ۴) آفریقا
What is the largest continent in the world?
✔ 1) Asia 2) Europe 3) Americas 4) Africa
نجاری روزی یک صندلی و شاگردش در سه روز یک صندلی میسازد. اگر نجار و شاگردش با هم کارکنند، ۱۲ صندلی رو در چند روز می سازند؟
۱) ۱۲ ✔ ۲) ۹ ۳) ۸ ۴) ۶
A carpenter makes a chair a day and his student makes a chair in three days If a carpenter and his student work together, how many days will they make 12 chairs?
1) 12 ✔ 2) 9 3) 8 4) 6
Task 2: Multiple-Choice QA (2)
7
Task 3: Sentiment Analysis (1)
We explored two relatively less investigated domains in Persian Sentiment Analysis:
Our sentiment labels are on a 5-point likert scale, [ -2, -1, 0, +1, +2], at two levels:
8
Example: “It tastes good but it’s so expensive even with a special offer. It’s almost double the price of fresh meat.”
Labels:
Task 3: Sentiment Analysis (2)
9
Defining sentiment aspects
Training annotators
Final annotation
Food & beverages aspects | Movie review aspects |
Purchase value/price Packaging Delivery Product quality Nutritional value taste/smell | Music Sound Directing story/screenplay acting/performance Cinematography scene |
Review sources
Annotation process
Total annotated reviews: 2,423
Annotation tasks and Cohen’s Kappa agreement:
Task 4: Textual Entailment (1)
Setup: This task is defined as determining the 3-way relationship between two sentences:
10
Translating MNLI instances using Google Translate
Writing a premise and hypothesis based on existing sentences
1- Based on natural sentences
2- Based on existing datasets
Entailment
Neutral
Contradiction
Premise: Poor people in more than a couple of counties in Atlanta
receive help from the Atlanta Legal Aid.
پیش فرض: مردم فقیر در بیش از چند شهرستان در آتلانتا از کمک حقوقی آتلانتا کمک میگیرند.
Hypothesis: Atlanta Legal Aid provides civil services to poor people
in five metro Atlanta counties.
.فرضیه: کمک حقوقی آتلانتا به مردم فقیر در پنج منطقه شهری آتلانتا خدمات مدنی ارائه می دهد
Task 4: Textual Entailment (2)
Overview of the data collection pipeline:
Overview of the data collection pipeline:
11
1- Sampling Sentences with Conjunctive Adverbs from Persian Wikipedia
2- MNLI Dataset
Google Translate (En-Fa)
S1: فرانسه بارها اعلام داشته که بحران اقتصادی بین المللی است ، پس چاره کار هم باید جهانی باشد.
France has repeatedly stated that crisis is an international economy, so the solution must be global.
S2: صدام حسين می گويد ترجيح می دهد بميرد ولی به تبعيد نرود.
Saddam Hussein says he would rather die but not go into exile
S3: پنج تن از آنها به ژاپن بازگشتند همچنین ممکن است بقیه ی ربوده شدگان هنوز زنده باشند.
Five of them came back to Japan. Also, the rest of Abducted might be still alive.
P: Corona virus spreads mainly between people who are in close contact with each other.
ویروس کرونا عمدتا بین افرادی که در تماس نزدیک با یکدیگر هستند ، گسترش می یابد.
H: People can be infected when droplets containing the virus are inhaled.
افراد هنگام استنشاق قطرات حاوی ویروس می توانند آلوده شوند.
Textual Entailment
Dataset
Human Annotation
Fixing Translations
Task 5: Question Paraphrasing (1)
Setup: This task is defined as determining whether two given questions are paraphrases or not:
12
(1) Based on natural sentences
(2) Based on existing datasets
Mining questions using Google auto-complete
Creating pairs of questions with high token overlap
Getting question pairs from QQP dataset
Translating them using Google Translation
سوال ۱: کدام شهرهای ایران در وضعیت سفید کرونا هستند؟
Q1: Which cities in Iran are in white zones for corona?
سوال ۲: کدام شهرهای ایران در وضعیت قرمز کرونا هستند؟
Q2: What cities are red zones of corona?
Task 5: Question Paraphrasing (2)
Overview of the data collection pipeline:
13
Incomplete Seed Sentences
QQP Dataset
Google’s Auto-complete
Google Translate (En-Fa)
Question Paraphrasing
Dataset
Human Annotation
Q1. آخرین روزهای زندگی خود را در کجا میخواهید سپری کنید؟
(Where do you want to spend your last days of life?)
Q2. آیا جهان موازی وجود دارد؟
(Are there any parallel universes?)
Q3. استارتاپ ها چیستند؟
(What are startups?)
Task 6: Translation (1)
14
آن کسانی که به جان غیب ایمان آرند و نماز به پا دارند و از هرچه روزیشان کردیم به فقیران انفاق کنند.
Who believe in the Unseen, are steadfast in prayer, and spend out of what We have provided for them.
Task 6: Translation (2)
15
Experiments: Setup
16
Persian LM
Multilingual LM
Experimental Findings (1)
17
86.2
Human:
Performance (F1)
LMs trained on
ParsiNLU Reading Comprehension
39.2
WikiBERT
(base)
40.7
49.0
mT5
(large)
49.2
70.4
mT5
(XL)
ParsBERT
(base)
mBERT
(base)
Experimental Findings (2)
18
39.2
WikiBERT
(base)
40.7
49.0
mT5
(large)
49.2
70.4
mT5
(XL)
86.2
Human:
LMs trained on
ParsiNLU Reading Comprehension
Performance (F1)
LMs trained on
SQuAD
67.4
68.2
ParsBERT
(base)
mBERT
(base)
mT5
(large)
mT5
(XL)
Experimental Findings (3)
19
39.2
WikiBERT
(base)
40.7
49.0
mT5
(large)
49.2
70.4
mT5
(XL)
86.2
Human:
LMs trained on
ParsiNLU Reading Comprehension
Performance (F1)
LMs trained on
SQuAD
LMs trained on
ParsiNLU Reading Comprehension +
SQuAD
67.4
68.2
73.6
74.7
ParsBERT
(base)
mBERT
(base)
mT5
(large)
mT5
(XL)
mT5
(large)
mT5
(XL)
Experimental Findings (4)
20
39.2
WikiBERT
(base)
40.7
49.0
mT5
(large)
49.2
70.4
mT5
(XL)
86.2
Human:
LMs trained on
ParsiNLU Reading Comprehension
Performance (F1)
LMs trained on
SQuAD
LMs trained on
ParsiNLU Reading Comprehension +
SQuAD
67.4
68.2
73.6
74.7
ParsBERT
(base)
mBERT
(base)
mT5
(large)
mT5
(XL)
mT5
(large)
mT5
(XL)
ParsiNLU: Summary
21