Generative Models, �LLMs & more
Generative models
Generative models
Generative models
Generative models
Generative models
Generative models
Generative models
Generative models
Generative models
Teaching a computer to read: Classification
Teaching a computer to write: Generative
LLMs
Generative models for language
LLMs: A simplified setting
LLMs: A simplified setting
Challenge 1: Words aren’t enough!
Remedy: Use tokens = short character subsequences � instead of words
Challenge 1: Words aren’t enough!
Remedy: Use tokens = short character subsequences � instead of words
Tokenization
For tokenizing, we start with tokens for all characters, then continue finding most common next-shortest subsequences.
Tokenization
For tokenizing, we start with tokens for all characters, then continue finding most common next-shortest subsequences.
single-char tokens
Tokenization
For tokenizing, we start with tokens for all characters, then continue finding most common next-shortest subsequences.
two-char tokens
Tokenization
For tokenizing, we start with tokens for all characters, then continue finding most common next-shortest subsequences.
two-char tokens
Tokenization
For tokenizing, we start with tokens for all characters, then continue finding most common next-shortest subsequences.
two-char tokens
Tokenization
For tokenizing, we start with tokens for all characters, then continue finding most common next-shortest subsequences.
two-char tokens
Tokenization
For tokenizing, we start with tokens for all characters, then continue finding most common next-shortest subsequences.
3+ char tokens
Challenge 2: Language is not fixed length
Challenge 2: Language is not fixed length
Challenge 2: Language is not fixed length
Challenge 2: Language is not fixed length
Challenge 2: Language is not fixed length
LLMs
But: still largely based on patterns of language
Why is this wrong?
Why is this wrong?
“Given that a ball is red, is it more likely in the left jar or in the right jar?”
Why is this wrong?
“Given the person is republican, are they more likely from California or Louisiana?”
Consequences of Language Models
The association between tokens “Lousiana” and “Republican”
is very strong!!
And “California” to “Democrat” is very strong!
But: This is also a counterintuitive question that people get wrong ☺�And: “Reasoning” / “Thinking” models get the question right
Consequences of Language Models
Be aware of these when using LLMs!
Life Lessons from Data Analysis…
Life Lessons from Data Analysis…
Life Lessons from Data Analysis…
Life Lessons from Data Analysis…
Life Lessons from Data Analysis…
Life Lessons from Data Analysis…
Life Lessons from Data Analysis…
Life Lessons from Data Analysis…
Recent Research: Generative Models + Quantum
Quantum Annealing
Generative Model: Image Denoising
“Noise”: Pixels are damaged in an image
Generative Model:
Generative Model: Image Denoising
Determining which pixels are “wrong”:
Catch:
Largest image a quantum computer can handle is…
12 x 12 pixels black and white!! ☹
Generative Model: Image Denoising
Lots of interesting things still being developed!