1 of 15

Natural Language Processing�(CSEP 517)

Spring 2025 ● Noah Smith

Many figures from Jurafsky and Martin ch. 9 & 13

2 of 15

Transformer, predicting a single next word

3 of 15

Encoder-decoder transformer

4 of 15

Probability distribution of possible generated sequences

5 of 15

Greedy decoding

6 of 15

Greedy decoding is not optimal

7 of 15

Beam search (k = 2)

8 of 15

Beam search scoring (k = 2)

9 of 15

Beam search�(as implemented in�typical libraries)

10 of 15

Beam search with�improvement from�Kasai et al. (2024)

11 of 15

Alternative beam search

12 of 15

If beam search is the answer, what is the question?�(Meister et al., 2020)

13 of 15

Sampling from the model

14 of 15

Softmax with temperature

15 of 15