The language model family
Transformer
Bi-directional LM
+Data
+Model size
Defense
Crosslingual
+Data
+Train time
+Batchsize
-NSP
-Model size
Shared weights
NSP → Sentence order
Factorized Embedding
Permutation LM
TransformerXL
+Data
+Knowledge graph
Multitask
Text2Text
+Data
Masked spans
ELMO
ULMFit
XLM
MT-DNN
T5
SpanBERT
RoBERTa
BERT
ALBERT
DistilBERT
Q-BERT
ERNIE
(Tsinghua)
XLNet
GPT
GPT-2
Grover
deepset.ai
(Inspired by Zhengyan Zhang & Xiaozhi Wang)
1
What about non-english languages?
→ Community effort!
→ Talk to us for TPU/GPU sponsoring
(hello@deepset.ai)
2