1 of 1

When FastText Pays Attention

Efficient Estimation of Word Representations Using Positional Weighting

Vítek Novotný, Michal Štefánik, Dávid Lupták, Eniafe F. Ayetiran, Petr Sojka

MIR research group <mir.fi.muni.cz>, Faculty of Informatics, Masaryk University

Bibliography

  1. KAHNEMAN, Daniel. Thinking, fast and slow. Macmillan, 2011.
  2. PETERS, Ellen, et al. Numeracy and decision making. Psychological science, 2006, 17.5: 407-413.
  3. CLARK, Kevin, et al. What does BERT look at? An analysis of BERT’s attention. arXiv:1906.04341, 2019.
  4. MIKOLOV, Tomáš, et al. Advances in pre-training distributed word representations. arXiv:1712.09405, 2017.
  5. NOVOTNÝ, Vít, et al. Towards useful word embeddings. RASLAN 2020, 2020, 37.

Introduction

  • Kahneman [1] categorizes human cognition into two systems:
    • System 1: fast, automatic, emotional, unconscious, …
    • System 2: slow, effortful, logical, conscious, …
  • Ellen et al. [2] show that both systems are mutually supportive.
  • Clark et al. [3] show that ensembling shallow log-bilinear LMs (FastText) and deep attention-based LMs (BERT) significantly outperforms either on dependency parsing. [2, Table 3]
  • Mikolov et al. [4] introduce positional weighting to FastText�and receive SOTA on English word analogy task (85%).
  • We open-source pw, evaluate on qualitative & extrinsic tasks.

Qualitative Evaluation

  • We measure the importance of context words at position p.�Words around masked word most important, left > right context
  • We cluster position features. Clusters boost context words:
    • Antepositional: with, under, in, of, …
    • Postpositional: ago, nonwithstanding, …
    • Informational: fascism, tornado, …, August

Text Classification

  • We use the Soft Cosine Measure with kNN classifier [5].

Positional model consistently outperforms base FastText.

Language Modeling

  • We use an LSTM with lookup table init.’d to context vectors [5].

Positional model consistently outperforms base FastText.

(8% features)

(11% features)

(81% features)