Introduction to Transformer-Based Language Model
Presenters : Bishnu Sarker, Sayane Shome
Date: 17-18 July, 2023
Learning Objectives of the session
Understanding the fundamental concepts behind transformers including ProtTrans transformers.
2
What is a Language Modeling?
P(W) = P(w1,w2,w3,w4,w5…wn)
P(w5|w1,w2,w3,w4)
P(W) or P(wn|w1,w2…wn-1) is called a language model.
3
Jurafsky, D. and Martin, J.H. (2023) Speech and Language Processing. https://web.stanford.edu/~jurafsky/slp3/
The Chain Rule applied to compute joint probability of words in a sentence
P(“its water is so transparent”) = P(its) × P(water|its) × P(is|its water)
× P(so|its water is) × P(transparent|its water is so)
4
Jurafsky, D. and Martin, J.H. (2023) Speech and Language Processing. https://web.stanford.edu/~jurafsky/slp3/
Markov Assumption
In other words, we approximate each component in the product
5
Jurafsky, D. and Martin, J.H. (2023) Speech and Language Processing. https://web.stanford.edu/~jurafsky/slp3/
Simplest case: Unigram model
6
fifth, an, of, futures, the, an, incorporated, a, a, the, inflation, most, dollars, quarter, in, is, mass
thrift, did, eighty, said, hard, 'm, july, bullish
that, or, limited, the
Some automatically generated sentences from a Unigram model
Jurafsky, D. and Martin, J.H. (2023) Speech and Language Processing. https://web.stanford.edu/~jurafsky/slp3/
Bigram model
7
texaco, rose, one, in, this, issue, is, pursuing, growth, in, a, boiler, house, said, mr., gurria, mexico, 's, motion, control, proposal, without, permission, from, five, hundred, fifty, five, yen
outside, new, car, parking, lot, of, the, agreement, reached
this, would, be, a, record, november
Jurafsky, D. and Martin, J.H. (2023) Speech and Language Processing. https://web.stanford.edu/~jurafsky/slp3/
N-gram models
“The computer which I had just put into the machine room on the fifth floor crashed.”
8
Jurafsky, D. and Martin, J.H. (2023) Speech and Language Processing. https://web.stanford.edu/~jurafsky/slp3/
Neural Language Models (LMs)
9
Jurafsky, D. and Martin, J.H. (2023) Speech and Language Processing. https://web.stanford.edu/~jurafsky/slp3/
Simple feedforward Neural Language Models
10
Jurafsky, D. and Martin, J.H. (2023) Speech and Language Processing. https://web.stanford.edu/~jurafsky/slp3/
Neural Language Model
Jurafsky, D. and Martin, J.H. (2023) Speech and Language Processing. https://web.stanford.edu/~jurafsky/slp3/
Why Neural LMs work better than N-gram LMs
12
Jurafsky, D. and Martin, J.H. (2023) Speech and Language Processing. https://web.stanford.edu/~jurafsky/slp3/
Transformers-based Large Language Model
Transformers-based Large Language Model
14
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30.
Transformers-based Large Language Model
15
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30.
Embedding Layer
Position Encoding
Multi-Head Attention Layer
Feed Forward Layer
Multi-Head Attention Layer
Encoder-Decoder Attention Layer
Feed Forward Neural Network
Linear Layer
Softmax Layer
Transformers-based Large Language Model
16
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30.
Embedding Layer
Position Encoding
Multi-Head Attention Layer
Feed Forward Layer
Multi-Head Attention Layer
Encoder-Decoder Attention Layer
Feed Forward Neural Network
Linear Layer
Softmax Layer
Transformers Self Attention Layer
17
P
C
Alammar, J (2018). The Illustrated Transformer [Blog post]. Retrieved from https://jalammar.github.io/illustrated-transformer/
Multi-Head Attention
18
P
C
Alammar, J (2018). The Illustrated Transformer [Blog post]. Retrieved from https://jalammar.github.io/illustrated-transformer/
Attention in Sequence Analysis
Elnaggar, Ahmed, et al. "Prottrans: Toward understanding the language of life through self-supervised learning." IEEE transactions on pattern analysis and machine intelligence 44.10 (2021): 7112-7127. https://github.com/agemagician/ProtTrans
ProtTrans Architecture for Sequence Embedding
20
Elnaggar, Ahmed, et al. "Prottrans: Toward understanding the language of life through self-supervised learning." IEEE transactions on pattern analysis and machine intelligence 44.10 (2021): 7112-7127.
Hands on Tutorial
Google colab notebook
Link : Colab-Notebook-Transformer
21
Break !
We will reconvene in 15 mins. Meanwhile, we are available for Q/As
Next in line : Hands-on case study of Protein Function Annotation
22