1 of 20

MasakhaPOS: Part-of-Speech Tagging for Typologically Diverse African Languages

ACL 2023

July 11, 2023

Presenter:

David Ifeoluwa Adelani (@davlanade)

UCL, United Kingdom

2 of 20

2

3 of 20

Motivation

  • Part-of-Speech (POS) tagging is a process of assigning the most probable grammatical category (or tag) to each word (or token) in a given sentence of a particular natural language.
  • A fundamental step for many NLP applications like
    • machine translation,
    • parsing,
    • text chunking,
    • speech recognition and synthesis
    • spell and grammar checking

3

Figure obtained from Joakim Nivre. Multilingual Dependency Parsing from Universal Dependencies to Sesame Street . In TSD, 2020

No large-scale POS dataset for African languages, only one Sub-Saharan African language (Wolof) has training data in UD.

4 of 20

Contributions

  • We develop MasakhaPOS—the largest POS dataset for 20 typologically diverse African languages.
  • We highlight the challenges of annotating POS for these languages using the universal dependencies (UD).
  • We explored better ways to extend POS tagging to unseen languages by leveraging cross-lingual transfer methods and geographically/syntactically-related languages.
  • We developed POS taggers using CRF and multilingual pre-trained language models.

4

5 of 20

Languages and data size

Annotated corpus is based on the news domain

NC: Niger-Congo

5

Hausa

Kiswahili

chiShona

Setswana

isiXhosa

isiZulu

Chichewa

Kiswahili

Dholuo

Kiswahili

Luganda

Kinyarwanda

Ghomala

Wolof

Mossi

Twi

Ewe

Fon

Yoruba

Igbo

Naija

Data split roughly: 800/100/600 train/dev/test

6 of 20

Annotation methodology

  • Annotation guide is based on Universal dependency (UD)

  • Number of part-of-speech tags: 17

6

Annotation Methodology

{n=1400} sentences

 

 

{n=100} sentences

 

Train a POS tagger (RemBERT)

Manually label

Automatic Labeling

Fix incorrect tags

 

https://universaldependencies.org/u/pos/

7 of 20

Challenges in annotating POS with UD guidelines

  • Definition of word
    • chiShona word: ndakazomuona
    • In English: I eventually saw him
  • Clitics
    • Wolof word: ci ab -> cib
    • In English: in a
  • One unit or multitoken words?
    • In Setswana: ngwana yo o ratang
    • In English: the child who likes
  • Verb or conjunction?
    • Example 1:
      • Yorùbá: Olú gbàgbé Bolá tí jàde
      • English: Olu forgot that Bola has gone
    • Example 2:
      • Yorùbá: Olú wọn wá
      • English: Olu said they came
  • Adjective or Verb?
  • Adverbs or particles?

7

  1. Tokenization and word segmentation

2. POS Ambiguities

8 of 20

Baseline models

  • Conditional Random Field (CRF)
  • Multilingual pre-trained language models
    • Massively multilingual (>=100 languages)
      • mBERT
      • XLM-R
      • RemBERT
    • African-centric (11-23 African languages)
      • AfriBERTa
      • AfroLM
      • AfroXLMR-base/large

8

Supports 2 - 8 focus languages

Supports 6 - 20 focus languages

9 of 20

Baseline results

Full-supervised - 800 training sentences

Multilingual pre-trained language models provides better performance than CRF

9

10 of 20

Baseline results

Full-supervised - 800 training sentences

Larger PLMs leads to further boost in performance

10

11 of 20

Baseline results

Full-supervised - 800 training sentences

African-centric PLMs that cover more African languages are more effective

11

12 of 20

Cross-lingual Transfer: Adapting to unseen languages

  • Impossible to create benchmark POS datasets for over 2,000 African languages
  • Need to develop effective methods for cross-lingual transfer.
  • How do we adapt existing POS datasets to MasakhaPOS?
  • Some tricks to improve cross-lingual transfer are:
    • Identify available dataset in any language (e.g on UD)
      • Preferably, languages that are geographically and linguistically related to the target language
    • Leverage available monolingual data in the target language for improved transfer.
      • Especially, parameter-efficient transfer learning methods like MAD-X

12

13 of 20

Transfer results: on aggregate

Geographically close source languages: Afrikaans, Arabic, English, French, Naija, Romanian and Wolof

FT-Eval: Fine-tune PLM on SOURCE, zero-shot evaluation on TARGET

Parameter-efficient fine-tuning methods:

  • LT-SFT (Ansell et al., 2021)
  • MAD-X (Pfeiffer et al, 2020)

13

14 of 20

Transfer results: on aggregate

Geographically close source languages: Afrikaans, Arabic, English, French, Naija, Romanian and Wolof

FT-Eval: Fine-tune PLM on a source language, perform zero-shot evaluation on the target language.

Parameter-efficient fine-tuning methods:

  • LT-SFT (Ansell et al., 2021)
  • MAD-X (Pfeiffer et al, 2020)

14

15 of 20

Transfer results: fine-grained analysis

How important is the source language using MAD-X adaptation?

  • English, Romanian, Wolof, and Shona

Evaluation on: Fon, Yoruba, and isiZulu

15

Multi-source adaptation is very effective

16 of 20

Conclusion

  • We developed MasakhaPOS—the largest POS dataset for 20 typologically diverse African languages.
  • We highlight the challenges of annotating POS for these languages using the universal dependencies (UD).
  • We developed POS taggers using CRF and multilingual pre-trained language models in both full-supervised settings and cross-lingual transfer settings.

16

17 of 20

Thank you

17

Hausa

Kiswahili

chiShona

Setswana

isiXhosa

isiZulu

Chichewa

Kiswahili

Dholuo

Kiswahili

Luganda

Kinyarwanda

Ghomala

Wolof

Mossi

Twi

Ewe

Fon

Yoruba

Igbo

Naija

MasakhaPOS

18 of 20

BACKUP SLIDE

18

19 of 20

Parameter-Efficient Fine-tuning with Adapters

19

19

STEP 1: Train language adapters on every language of interest

STEP 2: Train task adapter together with the language adapter, only modify task adapter, others parameters are frozen

Ruder 2022: NLP for African languages @Indaba

STEP 3: Zero-Shot Transfer to Target Language by replacing the source language adapter but keeping the task adapter.

20 of 20

Lottery Ticket Sparse Fine-tunings (LT-SFT)

  • Lottery Ticket Hypothesis (LTH) (Frankle & Carbin, 2019) that states that each neural model contains a sub-network (a "winning ticket") that, if trained again in isolation, can reach or even surpass the performance of the original model.

20

Ansell et al. 2022. Composable Sparse Fine-Tuning for Cross-Lingual Transfer. In ACL 2022.