1 of 55

CSCI-SHU 376: Natural Language Processing

Hua Shen

Course Agenda: 2026 Spring-NLP-[CSCI-SHU-376]-Class Schedule

2026-03-12

Spring 2026

Lecture 11: LLM Decoding / Semantic Parsing

2 of 55

Today’s Plan

LLM Decoding Overview
Sampling
Controllable Generation
Semantic Parsing

3 of 55

What is inside LLM?

A model defines a conditional probability distribution

4 of 55

LMs are locally normalized

It could be the sequence starts with low probability tokens, but have high overall probability
Therefore, it is hard to do inference with global constraints

5 of 55

Probability distribution -> Hallucination

Model generally assigns non-zero probability to any (incorrect) outputs

6 of 55

Our goal: Get “Good” Outputs

A “good” output given a probability distribution
Changing the decoding algorithms

7 of 55

Today’s Plan

LLM Decoding Overview
Sampling
Controllable Generation
Semantic Parsing

8 of 55

Recap: Greedy Decoding

Greedy Decoding: Compute argmax (over entire vocab) at every step

9 of 55

Recap: Beam Search

At every step, keep track of the k most probable partial translations
Score of each hypothesis = log probability of sequence so far

Not guaranteed to be optimal, but more efficient than exhaustive search

10 of 55

Highest probability always best?

Outputs with low probability tend to be worse

Difference between top outputs is unclear…

11 of 55

Highest probability always best?

Many outputs are meaningful!

12 of 55

Ancestral Sampling

Exactly samples from model distribution!

13 of 55

Ancestral Sampling

Long-tail problem
Even if each individual token in the long-tail has low probability, these small probabilities add up…

14 of 55

Top-K Sampling

Only sample from the most probable <k> tokens

15 of 55

Top-p Sampling

Also called nucleus sampling
Only sample from the top <p> probability mass
Ignore the long-tails

16 of 55

Epsilon Sampling

17 of 55

Contrastive Decoding

Smaller models make different mistakes

Choose outputs that the “expert” finds much more likely than the “amateur”

18 of 55

Today’s Plan

LLM Decoding Overview
Sampling
Controllable Generation
Semantic Parsing

19 of 55

Different types of Constraints

Low-level constraints: structured output, length etc

20 of 55

Different types of Constraints

High-level constraints: semantic, avoid hallucination etc

21 of 55

Prompting is not enough!

22 of 55

Constrained decoding: Manipulate logits

Set P(“climb” | X, y) = 0?

23 of 55

Constrained decoding: Rejection Sampling

Generate a lot of samples, then reject

24 of 55

Today’s Plan

LLM Decoding Overview
Sampling
Controllable Generation
Semantic Parsing

25 of 55

Semantic Parsing

26 of 55

Semantic Parsing: QA

27 of 55

Semantic Parsing: Instructions

28 of 55

Language to Meaning

29 of 55

Neural Semantic Parsing

30 of 55

Text-to-SQL Semantic Parsing

How many cities have at least 25,000 people?

Natural Language Question

Database Schema

City	Population	Area	…

Execution Result

SELECT count(c1) FROM w WHERE c2 >= 25000

SQL Query

Input

Output

31 of 55

Text-to-SQL Semantic Parsing: Evaluation Metrics

How many cities have at least 25,000 people?

Natural Language Question

Database Schema

City	Population	Area	…

Execution Result

SELECT count(c1) FROM w WHERE c2 >= 25000

SQL Query

Input

Output

Logical Form Accuracy

Execution Accuracy

32 of 55

Text-to-SQL Semantic Parsing: Supervision

How many cities have at least 25,000 people?

SELECT count(c1) FROM w WHERE c2 >= 25000

Execution Result

Natural Language Question

SQL Query

Database Schema

City	Population	Area	…

34 of 55

On the Potential of Lexico-logical Alignments for Semantic Parsing to SQL Queries

{ }, Jordan Boyd-Graber, Hal Daumé III, Lillian Lee

Tianze Shi

Chen Zhao

In EMNLP-Findings (2020)

35 of 55

Text-to-SQL Semantic Parsing: Supervision

Execution Result

How many cities have at least 25,000 people?

Natural Language Question

SELECT count(c1) FROM w WHERE c2 >= 25000

SQL Query

+Alignments

Database Schema

City	Population	Area	…

36 of 55

Dataset: SQUALL =“SQL+QUestion pairs ALigned Lexically”

Built on the existing dataset of WikiTableQuestions (Pasupat and Liang, 2015)

Collected expert annotations of logical forms and lexical alignments for more than 11k training instances

We also experimented with automatically-derived alignments

37 of 55

Annotation Interface

38 of 55

Annotation Interface

39 of 55

Alignment Annotations

~half of question tokens and ~90% of SQL tokens are aligned (excluding basic keywords of “SELECT”, “FROM”, “WHERE”, …)

Example frequently-aligned segments

40 of 55

Model

SEQ2SEQ+

Our seq2seq base model with attention and copying mechanisms
Competitive with a state-of-the-art text-to-SQL semantic parser (Suhr et al., 2020), evaluated on the Spider dataset (Yu et al., 2018)

41 of 55

Model: Encoder

Natural Language Question

Table Schema

City	Population	Area	Region	…

Word embedding lookup

Bi-LSTM final states

Bi-directional LSTM

Self-Attention

Attention

Bi-directional LSTM

How many cities have …

42 of 55

Model: Encoder w/ BERT

Natural Language Question

Table Schema

BERT

Attention

Bi-directional LSTM

[CLS]

[SEP] City [SEP] Population [SEP] Area [SEP] … [SEP]

How many cities have …

43 of 55

Model: Decoder

Natural Language Question

Encoder

Decoder LSTM

SELECT

count

…

Attention

Table Schema

<START>

44 of 55

Model: Decoder

Decoder LSTM

SELECT

…

<START>

MLP

Keyword

STR

COL

Copy

Mechanism

Over

Question Tokens

Copy

Mechanism

Over

Columns

MLP

Predict

Keyword

count

45 of 55

Model

SEQ2SEQ+

Our seq2seq base model with attention and copying mechanisms
Competitive with a state-of-the-art text-to-SQL semantic parser (Suhr et al., 2020), evaluated on the Spider dataset (Yu et al., 2018)

ALIGN

Same model architecture as SEQ2SEQ+, same inference steps
Two training strategies:

Supervised attention
Column Prediction

46 of 55

Model: Supervised Attention

Previously used in machine translation (Liu et al., 2016; Mi et al., 2016)
Decoder attention as an example; similar for encoder attention

47 of 55

Model: Supervised Attention

Previously used in machine translation (Liu et al., 2016; Mi et al., 2016)
Decoder attention as an example; similar for encoder attention

How many cities have at least 25,000 people ?

SELECT count ( c1 ) FROM w WHERE c2 >= 25000

About to predict this token

48 of 55

Model: Supervised Attention

Previously used in machine translation (Liu et al., 2016; Mi et al., 2016)
Decoder attention as an example; similar for encoder attention

How many cities have at least 25,000 people ?

SELECT count ( c1 ) FROM w WHERE c2 >= 25000

About to predict this token

Attention weights

0.3 0.25 0.05 0.1 0.05 0.05 0.05 0.1 0.05

49 of 55

Model: Supervised Attention

Previously used in machine translation (Liu et al., 2016; Mi et al., 2016)
Decoder attention as an example; similar for encoder attention

How many cities have at least 25,000 people ?

SELECT count ( c1 ) FROM w WHERE c2 >= 25000

About to predict this token

Attention weights

0.3 0.25 0.05 0.1 0.05 0.05 0.05 0.1 0.05

Alignment vector

0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0

50 of 55

Model: Supervised Attention

Previously used in machine translation (Liu et al., 2016; Mi et al., 2016)
Decoder attention as an example; similar for encoder attention

How many cities have at least 25,000 people ?

SELECT count ( c1 ) FROM w WHERE c2 >= 25000

About to predict this token

Attention weights

0.3 0.25 0.05 0.1 0.05 0.05 0.05 0.1 0.05

Alignment vector

0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0

Loss:

Final loss: linear combination of and seq2seq

51 of 55

Results on WikiTableQuestions

Unsurprisingly, strong supervision beats previous weakly-supervised models on WTQ’s test set

EXE accuracy

w/ BERT

w/o BERT

+6.2

+8.4

Best

ALIGN

(single)

ALIGN

(ensemble)

Best

ALIGN

(single)

ALIGN

(ensemble)

52 of 55

Alignment Annotations Provide Further Improvements

+4.4

LF accuracy

SEQ2SEQ+

ALIGN

Automatic

alignment

Sup.

decoder

attention

Sup.

encoder

attention

53 of 55

Analysis

Comparing ALIGN with SEQ2SEQ+

Logical form accuracy	+ 4.4
Template accuracy	+ 2.0
Column accuracy	+ 4.9
Logical form accuracy	+10.6
Execution accuracy	+12.5

On unseen templates

Absolute improvements

54 of 55

Unrealized Potential

+4.4

LF accuracy

SEQ2SEQ+

ALIGN

Oracle

attention

+23.9

55 of 55

Interim Summary

We collect and release a large-scale text-to-SQL semantic parsing dataset with lexical alignment annotations

Models trained with lexical alignments improve over strong baselines by 4.4% logical form accuracy

There still exists large unrealized potential

1 of 55

2 of 55

3 of 55

4 of 55

5 of 55

6 of 55

7 of 55

8 of 55

9 of 55

10 of 55

11 of 55

12 of 55

13 of 55

14 of 55

15 of 55

16 of 55

17 of 55

18 of 55

19 of 55

20 of 55

21 of 55

22 of 55

23 of 55

24 of 55

25 of 55

26 of 55

27 of 55

28 of 55

29 of 55

30 of 55

31 of 55

32 of 55

33 of 55

34 of 55

35 of 55

36 of 55

37 of 55

38 of 55

39 of 55

40 of 55

41 of 55

42 of 55

43 of 55

44 of 55

45 of 55

46 of 55

47 of 55

48 of 55

49 of 55

50 of 55

51 of 55

52 of 55

53 of 55

54 of 55

55 of 55