JavaScript isn't enabled in your browser, so this file can't be opened. Enable and reload.

1 of 1

RFP: On The Bias Towards Base-10 Numbers In Language Models

I recently did a non-scientific experiment where a BERT NMT model is used to translate non-Base-10 number words.
The usual methods of using transfer learning did not work well.
Due to very limited examples and time the experiment was not very scientific.
Proposal: create two artificial languages, with two different methods of creating number-words each. Put these through NMT models with the same process.

Language A has Base-10 numbers, while Language B has Base-24 numbers.
Two different, consistent methods of “generating” number words (e.g. “twenty four” vs “four and twenty”).

Arch should be something like Zhu, Xia et al (2020), but far simpler (maybe just a single decoder layer?)
Questions

Is my initial finding that BERT-models has a base-10 number bias valid?
Does this finding also apply to other big language models?
What if we trained from scratch (e.g. is this bias baked into the architecture of modern NMT architectures) ?

Requester: Xuanyi Chew, chewxy@gmail.com