Making Generative AI Better for You: �Fine-tuning and Experimentation�for Custom Research Solutions
Shane Storks (he/him)
PhD Candidate, Computer Science and Engineering
Situated Language and Embodied Dialogue (SLED) Lab
MIDAS Generative AI Tutorial Series
November 29, 2023
1
Large Language Models (LLMs)
LLMs like ChatGPT and GPT-4 have recently gained popularity due to their impressive language understanding and reasoning capabilities, making them useful assistants for a variety of language tasks.
How can we customize them and apply them to empirical research?
2
…
Role of LLMs in Research
3
Outline
4
Outline
5
Language Models (LMs)
6
Jack needed some money, so he went and shook his piggy ____
Minsky, M. (2000). Commonsense-based interfaces. In Commun. ACM, 43(8): p. 66-73.
tail
and
toy
bank
fruit
…
…
1.0
0.0
LM
Vector-Based Word Embeddings
7
2023
2013
Tomas Mikolov, Kai Chen, Greg Corrado, & Jeffrey Dean. (2013). “Efficient Estimation of Word Representations in Vector Space.” International Conference on Learning Representations 2013.
Tomas Mikolov, Ilya Sutskever, Kai Chen, et al. (2013). “Distributed Representations of Words and Phrases and their Compositionality.” Advanced in Neural Information Processing Systems 26.
Jeffrey Pennington, Richard Socher, & Christopher Manning. (2014). “GloVe: Global Vectors for Word Representation.” 2014 Conference on Empirical Methods in Natural Language Processing.
2018
(Image from TensorFlow docs)
word2vec
GloVe
Representing Sequences of Words
8
2023
2013
Rafal Jozefowicz, Wojciech Zremba, & Ilya Sutskever. (2015). An Empirical Exploration of Recurrent Network Architectures. ICML 2015.
2018
word2vec
GloVe
RNN LMs
Attention and Transformers
9
2023
2013
Dzmitry Bahdanau, Kyunghyun Cho, & Yoshua Bengio. (2015). ”Neural Machine Translation by Jointly Learning to Align and Translate.” International Conference on Learning Representations 2015.
Ashish Vaswani, Noam Shazeer, Niki Parmar, et al. (2017). “Attention is All You Need.” Advances in Neural Information Processing Systems 30.
2018
word2vec
attention
transformer
GloVe
RNN LMs
Contextual Language Representations
10
2023
2013
Matthew E. Peters, Mark Neumann, Mohit Iyyer, et al. (2018). “Deep Contextualized Word Representations.” 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.
2018
word2vec
attention
transformer
ELMo
ELMo
GloVe
RNN LMs
Self-Supervision and Transfer Learning in LMs
11
2023
2013
Alec Radford, Karthik Narasimhan, Tim Salimans, & Ilya Sutskever. (2018). “Improving Language Understanding by Generative Pre-Training.”
Jacob Devlin, Ming-Wei Chang, Kenton Lee, & Kristina Toutanova. (2018). “BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding.” 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.
2018
word2vec
attention
transformer
GPT
BERT
ELMo
GloVe
RNN LMs
“Jack needed some money, so he went and shook his piggy …”
12
Transformer Encoder
Jack
needed
some
money
,
so
he
went
and
shook
his
[MASK]
fruit
wallet
head
piggy
hand
…
…
1.0
0.0
Feedforward + Softmax
Bigger Data & Bigger Models -> LLMs
13
2023
2013
Alec Radford, Jeff Wu, Rewon Child, et al. (2019). ”Language Models are Unsupervised Multitask Learners.”
Corby Rosset. (2020). Turing-NLG: A 17-billion-parameter language model by Microsoft.
2018
word2vec
attention
transformer
GPT
BERT
ELMo
GPT-2
RoBERTa
MegatronLM
Turing-NLG
…
Human Performance
(figure from Microsoft)
GloVe
RNN LMs
Prompting & In-Context Learning
14
2023
2013
2018
word2vec
attention
transformer
GPT
BERT
ELMo
GPT-2
RoBERTa
MegatronLM
Turing-NLG
…
GPT-3
GloVe
RNN LMs
Instruction Tuning
15
2023
2013
Jason Wei, et al. (2022). Finetuned Language Models are Zero-shot Learners. ICLR 2022.
Long Ouyang, Jeff Wu, Xu Jiang, et al. (2022). “Training Language Models to Follow Instructions with Human Feedback.” arXiv: 2203.02155.�https://chat.openai.com/
2018
word2vec
attention
transformer
GPT
BERT
ELMo
GPT-2
RoBERTa
MegatronLM
Turing-NLG
…
GPT-3
FLAN
InstructGPT�ChatGPT
GloVe
RNN LMs
Vision & Multimodality
16
2023
2013
Jean-Baptiste Alayrac, Jeff Donahue, Pauline Luc, et al. (2022). ”Flamingo: a Visual Language Model for Few-Shot Learning.” Advances in Neural Information Processing Systems 35.
Junnan Li, Dongxu Li, Silvio Savarese, & Steven Hoi. (2023). ”BLIP-2: Bootstrapping Language-Image Pre-Training with Frozen Image Encoders and Large Language Models.” arXiv: 2301.12597.
OpenAI. (2023). “GPT-4 Technical Report.” arXiv: 2303.08774.
2018
word2vec
attention
transformer
GPT
BERT
ELMo
GPT-2
RoBERTa
MegatronLM
Turing-NLG
…
GPT-3
InstructGPT�ChatGPT
Flamingo
BLIP-2
GPT-4
…
GloVe
RNN LMs
Limitations of LLMs
18
Limitations of LLMs: Spurious Cues
19
Karen was assigned a roommate her first year of college. Her roommate asked her to go to a nearby city for a concert. Karen agreed happily. The show was absolutely exhilarating.
Karen became good friends with her roommate.
Karen hated her roommate.
How does the story end?
😀
😡
Limitations of LLMs: Data Contamination
20
Limitations of LLMs: Interpretability
21
(figure from Vinay Iyengar)
…
Limitations of LLMs: Hallucination
22
Summary
23
2 Ways to Customize LLMs
Fine-Tuning:
Small hardware requirements
Host locally (private, more flexible)
Optimized for specific task
Technical skills, engineering effort
Large amount of training data
Hard to adapt once trained
Prompting:
Larger hardware requirements
Best LMs behind proprietary APIs
Requires prompt engineering
User-friendly language interface
No training data needed
Generalizable and adaptable
24
Outline
25
Fine-Tuning: Text Classification
26
What is the sentiment of this text?
The film was a charming and affecting journey.
Negative
Positive
Pre-Trained LM
Classification Head
Softmax
P(Neg.)
P(Pos.)
1.0
0.0
The film was a charming and affecting journey.
-0.11 2.30
Fine-Tuning: Multiple Choice Completion
27
Pre-Trained LM
Classification Head
It was a very hot summer day.
He decided to run in the heat.
He felt much better!
It was a very hot summer day.
He drank a glass of ice cold water.
He felt much better!
Classification Head
Softmax
P(A)
P(B)
1.0
0.0
A
B
Which sentence is most likely to fill in the blank?
It was a very hot summer day.
___________________________
He felt much better!
He decided to run in the heat.
He drank a glass of ice cold water.
-0.45
3.76
A
B
Fine-Tuning: Multiple Choice QA
28
Pre-Trained LM
Classification Head
Q: How many legs does a ladybug have? �A: 2
Classification Head
Softmax
P(A)
P(B)
1.0
0.0
A
-0.05
3.77
How many legs does a ladybug have?
4
6
2
A
B
C
Classification Head
0.01
Q: How many legs does a ladybug have? �A: 4
B
Q: How many legs does a ladybug have? �A: 6
C
P(C)
Fine-Tuning: Token Classification
29
I
Verb
Noun
Determiner
Pronoun
see
a
dog
Pre-Trained LM
Softmax
P(V)
P(N)
1.0
0.0
Classification Head
-0.72 0.56 0.09 -0.11
P(P)
I
see
a
dog
P(D)
Label each token with its part of speech (POS):
Fine-Tuning: Text Generation
30
Jack
shook
his
piggy
toy
tail
bank
fruit
and
Continue the text:
…
…
Pre-Trained LM
Softmax
P(and)
P(bank)
1.0
0.0
Language Modeling Head
P(toy)
Jack
shook
his
piggy
P(tail)
P(fruit)
…
…
Parameter-Efficient Fine-Tuning (PEFT)
31
Low-Resource Adaptation (LoRA)
32
(figure from Sebastian Raschka)
Outline
33
Prompting LMs
To customize an LLM for your problem through prompting, need to make a few choices (prompt engineering):
34
Language Models (LMs)
35
Jack needed some money, so he went and shook his piggy ____
Minsky, M. (2000). Commonsense-based interfaces. In Commun. ACM, 43(8): p. 66-73.
tail
and
toy
bank
fruit
…
…
1.0
0.0
LM
Prompt Templates
If filling a blank from a few possible choices, can use a cloze prompt:
36
Task | Inputs ([X]) | Template | Answer ([Z]) |
Named Entity Recognition (NER) | [X1]: Mike went to Paris [X2]: Paris | [X1]. [X2] is a [Z] entity. | organization location person name … |
Reading Comprehension | Daniela Hantuchova knocks Venus Williams out of Eastbourne 6-2 5-7 6-2. | [X]��Hantuchova breezed through the first set in just under 40 minutes after breaking Williams’ serve twice to take it 6-2 and led the second 4-2 before [Z] hit her stride. | Daniela Hantuchova Venus Williams |
Prompt Templates
When completing a prompt or generating text, use a prefix prompt:
37
Task | Inputs ([X]) | Template | Answer ([Z]) |
Sentiment Classification | I love this movie. | [X] The movie is [Z] | good bad |
Question Answering | What color is the sky? A. Red�B. Yellow�C. Blue�D. Green | Question: [X] �Answer: [Z] | A B C D |
Prompt Templates
When completing a prompt or generating text, use a prefix prompt:
38
Task | Inputs ([X]) | Template | Answer ([Z]) |
Summarization | MIDAS and the Michigan AI Lab will host a faculty workshop with the theme of Generative Artificial Intelligence (Generative AI) for research. … | [X] tl;dr [Z] | MIDAS & Michigan AI Lab host faculty workshop on Generative AI for research. Explore impact, use cases, ethical considerations & collaboration opportunities. All faculty welcome. |
Translation | Je vous aime. | French: [X] English: [Z] | I love you. I fancy you. … |
\
Finding the Best Template and Answers
39
good | great | okay | bad | awful |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
I love this movie.
[X] The movie was so [Z]
[X] I thought it was[Z]
[X] The movie is [Z]
[X] This movie was [Z]
[X] The film is [Z]
LLM
P([Z]=_)
Finding the Best Template and Answers
40
good | great | okay | bad | awful |
| ✔ | | | |
| ✔ | | | |
| ✔ | | | |
| ✔ | | | |
✔ | | | | |
I love this movie.
[X] The movie was so [Z]
[X] I thought it was[Z]
[X] The movie is [Z]
[X] This movie was [Z]
[X] The film is [Z]
LLM
P([Z]=_)
Finding the Best Template and Answers
41
good | great | okay | bad | awful |
∑ | ∑ | ∑ | ∑ | ∑ |
∑ | ∑ | ∑ | ∑ | ∑ |
∑ | ∑ | ∑ | ∑ | ∑ |
∑ | ∑ | ∑ | ∑ | ∑ |
∑ | ∑ | ∑ | ∑ | ∑ |
I love this movie.
[X] The movie was so [Z]
[X] I thought it was[Z]
[X] The movie is [Z]
[X] This movie was [Z]
[X] The film is [Z]
LLM
P([Z]=_)
Finding the Best Template and Answers
42
good | great | okay | bad | awful |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
I love this movie.
[X] The movie was so [Z]
[X] I thought it was[Z]
[X] The movie is [Z]
[X] This movie was [Z]
[X] The film is [Z]
LLM
P([Z]=_)
Finding the Best Template and Answers
43
good | great | okay | bad | awful |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
I love this movie.
[X] The movie was so [Z]
[X] I thought it was[Z]
[X] The movie is [Z]
[X] This movie was [Z]
[X] The film is [Z]
LLM
P([Z]=_)
Finding the Best Template and Answers
44
good | great | okay | bad | awful |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
I love this movie.
[X] The movie was so [Z]
[X] I thought it was[Z]
[X] The movie is [Z]
[X] This movie was [Z]
[X] The film is [Z]
LLM
P([Z]=_)
Managing Randomness in LLMs
45
In-Context Learning
46
Tom B. Brown, Benjamin Mann, Nick Ryder, et al. (2020). “Language Models are Few-Shot Learners.” arXiv: 2005.14165.
Chain-of-Thought Prompting
47
Wei, J., et al. (2022). Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. Advances in Neural Information Processing Systems 35.
Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., & Iwasawa, Y. (2022). Large Language Models are Zero-Shot Reasoners. Advances in Neural Information Processing Systems 35.
48
@shanestorks www.shanestorks.com
Next: From Theory to Practice!
I’m on the job market for academic and industry positions!