1 of 2

The language model family

Transformer

Bi-directional LM

+Data

+Model size

Defense

Crosslingual

+Data

+Train time

+Batchsize

-NSP

-Model size

Shared weights

NSP → Sentence order

Factorized Embedding

Permutation LM

TransformerXL

+Data

+Knowledge graph

Multitask

Text2Text

+Data

Masked spans

ELMO

ULMFit

XLM

MT-DNN

T5

SpanBERT

RoBERTa

BERT

ALBERT

DistilBERT

Q-BERT

ERNIE

(Tsinghua)

XLNet

GPT

GPT-2

Grover

deepset.ai

(Inspired by Zhengyan Zhang & Xiaozhi Wang)

1

2 of 2

What about non-english languages?

→ Community effort!

→ Talk to us for TPU/GPU sponsoring

(hello@deepset.ai)

2