1 of 1

RFP: On The Bias Towards Base-10 Numbers In Language Models

  • I recently did a non-scientific experiment where a BERT NMT model is used to translate non-Base-10 number words.
  • The usual methods of using transfer learning did not work well.
  • Due to very limited examples and time the experiment was not very scientific.
  • Proposal: create two artificial languages, with two different methods of creating number-words each. Put these through NMT models with the same process.
    • Language A has Base-10 numbers, while Language B has Base-24 numbers.
    • Two different, consistent methods of “generating” number words (e.g. “twenty four” vs “four and twenty”).
  • Arch should be something like Zhu, Xia et al (2020), but far simpler (maybe just a single decoder layer?)
  • Questions
    • Is my initial finding that BERT-models has a base-10 number bias valid?
    • Does this finding also apply to other big language models?
    • What if we trained from scratch (e.g. is this bias baked into the architecture of modern NMT architectures) ?

Requester: Xuanyi Chew, chewxy@gmail.com