GenAI Workshop at Red Cross/Red Crescent
Geneva, 17 August 2023
Nicolò Tamagnone
Public
first of all, let’s demystify …
Systems like ChatGPT (or similar) that can generate human-like text and engage in strikingly natural conversations have seized the public imagination, criticism, even fear in some cases
Yes, BUT
where k is the size of the context window, and the conditional probability P is modeled using a neural network (transformer decoder) with parameters Θ given an unsupervised corpus of tokens U. Improving Language Understanding by Generative Pre-Training - OpenAI - 2018
Public
why LLM hype started only now?
2018
From 2018 to 2022, language models dataset size increased by more than 3 orders
Public
2. Every year model size increases by 10x
GPT-4 (unknown size. estimate of more than 1 trillion parameters. This makes it 1000 times larger than GPT-2 and nearly 1000 times larger than GPT-3)
Public
The evolutionary tree of modern LLMs.
Nowadays lots of research papers and articles use the term LLM mainly referring to auto-regressive architecture models (decoder-only branch)
Public
better to make a distinction…
Pre-trained LLM
Fine-tuned LLM
Language Model objective (next token prediction) is different from the objective “follow the user’s instructions helpfully and safely”
Instruction Tuning (Supervised) and Reinforcement Learning with Human Feedback (RLHF) after pre-training
Public
Limitations
Hallucination
hallucination in LLMs refers to the fact that for any given input, a model can output falsehoods, contradictions (no sense as well) that were not present in its input or training dataset
Bias & Falsehoods
LLMs can perpetuate bias and propagate falsehoods, as they are trained on diverse and (not well) unfiltered data from the internet
Learned
Data Cut-Off
parametric knowledge is limited to the data used in the pre-training phase. Issue of knowledge
conflicts, where the contextual information contradicts the learned knowledge
Missing Knowledge
Intrinsic
Public
Public
The best thing about AI is its ability to …
even at the first step there are a lot of possible “next words” to choose from though their probabilities fall off quite quickly.
The straight line on this log-log plot corresponds to an n–1 “power-law” decay that’s very characteristic of the general statistics of language
Public