Natural Language Processing�(CSEP 517)
Spring 2025 ● Noah Smith
Many figures from Jurafsky and Martin ch. 9 & 13
Transformer, predicting a single next word
Encoder-decoder transformer
Probability distribution of possible generated sequences
Greedy decoding
Greedy decoding is not optimal
Beam search (k = 2)
Beam search scoring (k = 2)
Beam search�(as implemented in�typical libraries)
Beam search with�improvement from�Kasai et al. (2024)
Alternative beam search
If beam search is the answer, what is the question?�(Meister et al., 2020)
Sampling from the model
Softmax with temperature