Computational Modeling� of Gender in Arabic
Nizar Habash
New York University Abu Dhabi
The New York University Global Network�شبكة جامعة نيويورك العالمية
2
New York University Abu Dhabi�جامعة نيويورك أبوظبي
3
CAMeL Lab مختبر «كامل» Computational Approaches to Modeling Language Lab |
@CamelNlp
scholar.camel-lab.com
|
|
Computational Modeling� of Gender in Arabic
Nizar Habash
New York University Abu Dhabi
|
Roadmap |
| |
| |
| |
Arabic and its Variants |
|
Arabic Orthographic Ambiguity |
ولعين
وَلِعِينَ وَلِعَينٍ وَلَعِينٌ
Infatuated (m.pl) # and for an eye/spring # and cursed
وسنقولها
/wasanaqūluhā/
و+ س+ ن+ قول + ها
wa+sa+na+qūl+u+hā
and+will+we+say+it
And we will say it
قال، قالت، قالا، قالوا، قلتَ،� قلتِ، قلتما، قلتم، قلتن،
يقولُ، يقولَ، يقل، تقولُ، تقولَ، تقل، تقولين، تقولي،
... فقال، فقالت، فقالا ...
... وسأقولها، وسنقولها، ...
|
|
Arabic Morphological Richness |
|
Arabic Morphological Complexity: Beyond the Binary |
(Alkuhlani & Habash, 2011)
<Form>/<Function>
|
Arabic Morphological Complexity: Beyond the Binary |
|
Complex Morphosyntactic Agreement |
شهيرة | مدن | ثلاث | في | الطرشاء | الموسيقية | سكنت |
famous FS | cities FP | three�M | in | the-deaf�FS | the-musician�FS | live �FS |
irrationality�agreement �P >> FS | templatic�MS | inverse gender agreement in numbers | | templatic�MS | � | |
Evelyn Glennie
|
Complex Morphosyntactic Agreement |
(Alkuhlani & Habash, 2011)
A Note on Gender Neutral Arabic�Existing Tactics
… beyond the Masculine Generic
14
A Note on Gender Neutral Arabic�Emerging Tactics
15
A Note on Gender Neutral Arabic�Experimental Tactics
16
Ibdal factory for Language and Queer Translations
A Note on Gender Neutral Arabic�Experimental Tactics
17
|
Roadmap |
MT 2023
Gender translation errors in machine translation for morphologically rich languages�persist
male doctor
female nurse
MT 2023
Gender translation errors in machine translation for morphologically rich languages�persist
The world is biased. The data is biased. The models are biased.
But even if we fix all of these, we will still have a problem!
NLP systems are mostly gender-unaware single-output systems.
male doctor
female nurse
MT 2023
Gender translation errors in machine translation for morphologically rich languages�persist
The world is biased. The data is biased. The models are biased.
But even if we fix all of these, we will still have a problem!
NLP systems are mostly fragile gender-unaware single-output systems.
male nurse
female nurse
MT 2023
Gender translation errors in machine translation for morphologically rich languages�persist
The world is biased. The data is biased. The models are biased.
But even if we fix all of these, we will still have a problem!
NLP systems are mostly really fragile gender-unaware single-output systems.
male doctor
male doctor
MT 2023: inconsistent outputs
♂
♂
♂
♂
♂
♂
♂
♂
♂
♂
♂
♂
♀
♀
♀
♀
♀
♀
♀
♀
♂
♂
♂
♂
♂
♂
♂
♂
♀
♀
♀
♀
♀
♀
♀
♀
♀
♂
MT 2023: inconsistent agreement
24
ChatGPT MT 2023: better, but…
Consistent output + but with typical single-output gender bias
|
Roadmap |
28
Arabic Gender Rewriting Task
Work with my PhD student, Bashar Alhafni
29
Arabic Gender Rewriting Task
NLP System
أنا طبيب رائع
“I am a wonderful [male] doctor”
Gender
Rewriting
System
Target Gender: Feminine
30
NLP System
أنا طبيب رائع
“I am a wonderful [male] doctor”
Gender
Rewriting
System
أنا طبيبة رائعة
Target Gender: Feminine
Arabic Gender Rewriting Task
31
NLP System
أنا طبيب رائع
“I am a wonderful [male] doctor”
Gender
Rewriting
System
أنا طبيبة رائعة
أنا طبيب رائع
Target Gender: Masculine
Target Gender: Feminine
Arabic Gender Rewriting Task
Arabic Parallel Gender Corpus (APGC) v2.0 (Alhafni et al., 2022a)
We developed an Arabic parallel gender corpus
Arabic Parallel Gender Corpus (APGC) v2.0 (Alhafni et al., 2022a)
Arabic Parallel Gender Corpus (APGC) v2.0 (Alhafni et al., 2022a)
35
Multi-User Gender Rewriting Model (Alhafni et al., 2022b)
I. Gender Identification (GID):
II. Out-of-context Word Gender Rewriting:
III. In-context Ranking & Selection:
36
Evaluation & Results
Evaluation:
Baselines:
37
Multi-User Gender Rewriting Evaluation & Results
Results on Dev:
38
Results on Dev:
Multi-User Gender Rewriting Evaluation & Results
39
Results on Dev:
Multi-User Gender Rewriting Evaluation & Results
40
Results on Dev:
Multi-User Gender Rewriting Evaluation & Results
41
Results on Dev:
Multi-User Gender Rewriting Evaluation & Results
42
Results on Test:
Multi-User Gender Rewriting Evaluation & Results
43
Error Analysis:
Multi-User Gender Rewriting Evaluation & Results
44
Automatic Post-Editing of MT Output
Check out our demo poster…
45
|
Roadmap |
Outlook
Outlook
Questions?
Nizar Habash
nizar.habash@nyu.edu