1 of 9

GenAI Workshop at Red Cross/Red Crescent

Geneva, 17 August 2023

Nicolò Tamagnone

Public

2 of 9

first of all, let’s demystify …

Systems like ChatGPT (or similar) that can generate human-like text and engage in strikingly natural conversations have seized the public imagination, criticism, even fear in some cases

Yes, BUT

where k is the size of the context window, and the conditional probability P is modeled using a neural network (transformer decoder) with parameters Θ given an unsupervised corpus of tokens U. Improving Language Understanding by Generative Pre-Training - OpenAI - 2018

Public

3 of 9

why LLM hype started only now?

Training dataset size of language models increased by 0.23 order of magnitude/year

2018

From 2018 to 2022, language models dataset size increased by more than 3 orders

Public

4 of 9

2. Every year model size increases by 10x

GPT-4 (unknown size. estimate of more than 1 trillion parameters. This makes it 1000 times larger than GPT-2 and nearly 1000 times larger than GPT-3)

Public

5 of 9

The evolutionary tree of modern LLMs.

Nowadays lots of research papers and articles use the term LLM mainly referring to auto-regressive architecture models (decoder-only branch)

Public

6 of 9

better to make a distinction…

Pre-trained LLM

GTP (3-4)
PalM (1-2)
LLaMa (1-2)
Falcon
…

Fine-tuned LLM

ChatGPT
Bard
Alpaca (LLaMa 1)
Falcon-instruct
Claude (1-2)
…

Language Model objective (next token prediction) is different from the objective “follow the user’s instructions helpfully and safely”

Instruction Tuning (Supervised) and Reinforcement Learning with Human Feedback (RLHF) after pre-training

Public

7 of 9

Limitations

Hallucination

hallucination in LLMs refers to the fact that for any given input, a model can output falsehoods, contradictions (no sense as well) that were not present in its input or training dataset

Bias & Falsehoods

LLMs can perpetuate bias and propagate falsehoods, as they are trained on diverse and (not well) unfiltered data from the internet

Learned

Data Cut-Off

parametric knowledge is limited to the data used in the pre-training phase. Issue of knowledge

conflicts, where the contextual information contradicts the learned knowledge

Missing Knowledge

Intrinsic

Public

9 of 9

The best thing about AI is its ability to …

even at the first step there are a lot of possible “next words” to choose from though their probabilities fall off quite quickly.

The straight line on this log-log plot corresponds to an n^–1 “power-law” decay that’s very characteristic of the general statistics of language

Public

1 of 9

2 of 9

3 of 9

4 of 9

5 of 9

6 of 9

7 of 9

8 of 9

9 of 9