JavaScript isn't enabled in your browser, so this file can't be opened. Enable and reload.

1 of 15

Natural Language Processing�(CSEP 517)

Spring 2025 ● Noah Smith

Many figures from Jurafsky and Martin ch. 9 & 13

2 of 15

Transformer, predicting a single next word

Encoder-decoder transformer

Probability distribution of possible generated sequences

Greedy decoding

Greedy decoding is not optimal

Beam search (k = 2)

Beam search scoring (k = 2)

Beam search�(as implemented in�typical libraries)

Beam search with�improvement from�Kasai et al. (2024)

Alternative beam search

If beam search is the answer, what is the question?�(Meister et al., 2020)

Sampling from the model

Softmax with temperature