1 of 60

Recitation 9

(CTC Decoding and Beam Search)

Gabrial, Harini & Quentin

2 of 60

HW1P2 vs. HW3P2

HW1P2

  • Sequence Classification for phoneme recognition.
  • Time-synchronous outputs.

HW3P2

  • Sequence to Sequence with Order Synchrony
  • Training: Using CTC Loss to deal with…
    • The problem of alignment
    • The problem of repetition
  • Inference
    • Greedy Search
    • Beam Search

3 of 60

Training

4 of 60

Problem Set-up

  • Inputs and targets are not time-aligned
    • |Y| ≠ |X|
    • |Y| and |X| not proportional
  • However, they are order-aligned
  • We will compress predicted sequences
  • We need to be able to yield repeating outputs after compression

5 of 60

Model

x(T-1)

x(T)

x(0)

x(1)

y(0)

Model

y(1)

Model

y(T-1)

Model

y(T)

x(1)

Model

y(1)

x(1)

Model

y(1)

Someone says “zoo”

We want to transcribe “zoo” at these time steps

Z

O

O

Inputs

Probabilities

Target

(Training)

Two problems

(1) time alignment and (2) repetition

One Asset

(1) order alignment

6 of 60

Model

x(T-1)

x(T)

x(0)

x(1)

y(0)

Model

y(1)

Model

y(T-1)

Model

y(T)

x(1)

Model

y(1)

x(1)

Model

y(1)

Someone says “zoo”

We want to transcribe “zoo” at these time steps

Z

O

O

?

?

?

?

?

?

Inputs

Probabilities

Target

All time-aligned targets for Target

?

?

?

?

?

?

?

?

?

?

?

?

Use CTC to get time-aligned targets

(Training)

7 of 60

The problem of repetition…

8 of 60

(Training)

A note on repetition…

“zzzooo”

“zoo”

“zo”

“zzzzzzzzo”

“zo” (🙁)

9 of 60

(Training)

A note on repetition…

“zzzooo”

“zoo”

“zo”

“zzzzzzzzo”

“zo” (🙁)

“zzz%oo%o”

“zo%o”

“zo”

“zzzzzzzzo”

“zoo” (😀)

“zzzooo”

“zoo”

“zo”

zzzzzzzo”

“zo” (🙁)

To account for repetitions, introduce a “break” character (“%”)

10 of 60

The problem of alignment…

11 of 60

(Training)

A note on alignment…

“z%o%oo”

“z%oo%o”

“zooo%o”

“zoo”(🙁)

12 of 60

(Training)

A note on alignment…

“z%o%oo”

“z%oo%o”

“zooo%o”

“zoo”(🙁)

If we use a time-aligned target, we can compute a differentiable loss.

But which alignment should we choose?

13 of 60

We could use the Viterbi algorithm to find the most probable path and use that to calculate the loss. However, there is a better option…

Using the asset of order alignment, as well as some additional “rules”, we can actually use ALL POSSIBLE ALIGNMENTS and calculate an EXPECTED LOSS over them.

14 of 60

We could use the Viterbi algorithm to find the most probable path and use that to calculate the loss. However, there is a better option…

Using the asset of order alignment, as well as some additional “rules”, we can actually use ALL POSSIBLE ALIGNMENTS and calculate an EXPECTED LOSS over them.

1. Alignments b/t X and Y are monotonic (once you advance to the next input, you may only keep the corresponding output or advance to the next output)

2. Alignment of X to Y is many-to-one (there may be many input elements aligning to a single output element).

3. Length of Y ≤ Length of X

15 of 60

Given: The “break” character, order alignment, and some additional rules…

Task: Find possible alignments, their probabilities, then calculate a differentiable loss

We could use the Viterbi algorithm to find the most probable path and use that to calculate the loss. However, there is a better option…

Using the asset of order alignment, as well as some additional “rules”, we can actually use ALL POSSIBLE ALIGNMENTS and calculate an EXPECTED LOSS over them.

16 of 60

(Training)

We want to transcribe “zoo” at these time steps

Z

O

O

Target

y(0)

y(1)

y(T-1)

y(T)

y(1)

y(1)

Probabilities

17 of 60

(Training)

We want to transcribe “zoo” at these time steps

Z

O

O

Target

P(“a”)

P(“b”)

.

.

.

P(“z”)

P(“a”)

P(“b”)

.

.

.

P(“z”)

P(“a”)

P(“b”)

.

.

.

P(“z”)

P(“a”)

P(“b”)

.

.

.

P(“z”)

P(“a”)

P(“b”)

.

.

.

P(“z”)

P(“a”)

P(“b”)

.

.

.

P(“z”)

Probabilities

18 of 60

(Training)

We want to transcribe “zoo” at these time steps

Z

O

O

Target

P(“%”)

P(“z”)

P(“%”)

P(“o”)

P(“%”)

P(“o”)

P(“%”)

P(“%”)

P(“z”)

P(“%”)

P(“o”)

P(“%”)

P(“o”)

P(“%”)

P(“%”)

P(“z”)

P(“%”)

P(“o”)

P(“%”)

P(“o”)

P(“%”)

P(“%”)

P(“z”)

P(“%”)

P(“o”)

P(“%”)

P(“o”)

P(“%”)

P(“%”)

P(“z”)

P(“%”)

P(“o”)

P(“%”)

P(“o”)

P(“%”)

P(“%”)

P(“z”)

P(“%”)

P(“o”)

P(“%”)

P(“o”)

P(“%”)

Probabilities

Limit to the characters in the Target with breaks inserted

19 of 60

%

z

%

<start>

<end>

“zoo”

o

%

o

%

%

z

%

o

%

o

%

%

z

%

o

%

o

%

%

z

%

o

%

o

%

%

z

%

o

%

o

%

%

z

%

o

%

o

%

%

z

%

o

%

o

%

Not all of these are “viable” paths/alignments. Which ones are?

(Training)

20 of 60

%

z

%

<start>

<end>

“zoo”

o

%

o

%

%

z

%

o

%

o

%

%

z

%

o

%

o

%

%

z

%

o

%

o

%

%

z

%

o

%

o

%

%

z

%

o

%

o

%

%

z

%

o

%

o

%

Let’s start from the outer edges and work our way in…

(Training)

21 of 60

%

z

%

<start>

<end>

“zoo”

o

%

o

%

%

z

%

o

%

o

%

%

z

%

o

%

o

%

%

z

%

o

%

o

%

%

z

%

o

%

o

%

%

z

%

o

%

o

%

%

z

%

o

%

o

%

Let’s start from the outer edges and work our way in…

(Training)

22 of 60

%

z

%

<start>

<end>

“zoo”

o

%

o

%

%

z

%

o

%

o

%

%

z

%

o

%

o

%

%

z

%

o

%

o

%

%

z

%

o

%

o

%

%

z

%

o

%

o

%

%

z

%

o

%

o

%

Let’s start from the outer edges and work our way in…

(Training)

23 of 60

%

z

%

<start>

<end>

“zoo”

o

%

o

%

%

z

%

o

%

o

%

%

z

%

o

%

o

%

%

z

%

o

%

o

%

%

z

%

o

%

o

%

%

z

%

o

%

o

%

%

z

%

o

%

o

%

Let’s start from the outer edges and work our way in…

Dotted lines are paths that may seem viable at first, but are not!

(Training)

This path won’t work, because it would skip the “z” (not monotonic)

24 of 60

%

z

%

<start>

<end>

“zoo”

o

%

o

%

%

z

%

o

%

o

%

%

z

%

o

%

o

%

%

z

%

o

%

o

%

%

z

%

o

%

o

%

%

z

%

o

%

o

%

%

z

%

o

%

o

%

Let’s start from the outer edges and work our way in…

(Training)

25 of 60

%

z

%

<start>

<end>

“zoo”

o

%

o

%

%

z

%

o

%

o

%

%

z

%

o

%

o

%

%

z

%

o

%

o

%

%

z

%

o

%

o

%

%

z

%

o

%

o

%

%

z

%

o

%

o

%

Let’s start from the outer edges and work our way in…

(Training)

26 of 60

%

z

%

<start>

<end>

“zoo”

o

%

o

%

%

z

%

o

%

o

%

%

z

%

o

%

o

%

%

z

%

o

%

o

%

%

z

%

o

%

o

%

%

z

%

o

%

o

%

%

z

%

o

%

o

%

Let’s start from the outer edges and work our way in…

Dotted lines are paths that may seem viable at first, but are not!

(Training)

This path won’t work, because it would skip the first “o”

27 of 60

%

z

%

<start>

<end>

“zoo”

o

%

o

%

%

z

%

o

%

o

%

%

z

%

o

%

o

%

%

z

%

o

%

o

%

%

z

%

o

%

o

%

%

z

%

o

%

o

%

%

z

%

o

%

o

%

Let’s start from the outer edges and work our way in…

(Training)

28 of 60

%

z

%

<start>

<end>

“zoo”

o

%

o

%

%

z

%

o

%

o

%

%

z

%

o

%

o

%

%

z

%

o

%

o

%

%

z

%

o

%

o

%

%

z

%

o

%

o

%

%

z

%

o

%

o

%

Let’s start from the outer edges and work our way in…

(Training)

29 of 60

%

z

%

<start>

<end>

“zoo”

o

%

o

%

%

z

%

o

%

o

%

%

z

%

o

%

o

%

%

z

%

o

%

o

%

%

z

%

o

%

o

%

%

z

%

o

%

o

%

%

z

%

o

%

o

%

Let’s start from the outer edges and work our way in…

This path won’t work because you won’t be able to get to the final character “in time” based on the sequence length.

Dotted lines are paths that may seem viable at first, but are not!

(Training)

30 of 60

%

z

%

<start>

<end>

“zoo”

o

%

o

%

%

z

%

o

%

o

%

%

z

%

o

%

o

%

%

z

%

o

%

o

%

%

z

%

o

%

o

%

%

z

%

o

%

o

%

%

z

%

o

%

o

%

These are the viable alignments

(skips bolded)

(Training)

31 of 60

%

z

%

<start>

<end>

“zoo”

o

%

o

%

%

z

%

o

%

o

%

%

z

%

o

%

o

%

%

z

%

o

%

o

%

%

z

%

o

%

o

%

%

z

%

o

%

o

%

%

z

%

o

%

o

%

These are the viable alignments

(Training)

32 of 60

%

z

%

<start>

<end>

“zoo”

o

%

o

%

%

z

%

o

%

o

%

%

z

%

o

%

o

%

%

z

%

o

%

o

%

%

z

%

o

%

o

%

%

z

%

o

%

o

%

%

z

%

o

%

o

%

At any given node, you can calculate the posterior probability of reaching that node at that time step using the product of the probability of reaching from the “forward” and “backward” passes

(See Lecture)

This is where dynamic programming comes in handy!

(Training)

33 of 60

%

z

%

<start>

<end>

“zoo”

o

%

o

%

%

z

%

o

%

o

%

%

z

%

o

%

o

%

%

z

%

o

%

o

%

%

z

%

o

%

o

%

%

z

%

o

%

o

%

%

z

%

o

%

o

%

We can calculate the probability of ALL ALIGNMENTS using the forward/backward method and use these to compute an EXPECTED LOSS.

(Training)

34 of 60

%

z

%

<start>

<end>

“zoo”

o

%

o

%

%

z

%

o

%

o

%

%

z

%

o

%

o

%

%

z

%

o

%

o

%

%

z

%

o

%

o

%

%

z

%

o

%

o

%

%

z

%

o

%

o

%

We can calculate the probability of ALL ALIGNMENTS using the forward/backward method and use these to compute an EXPECTED LOSS.

(Training)

NOW WE CAN TRAIN

(YAY!)

35 of 60

Training Procedure

With Connectionist Temporal Classification(CTC)

  1. Define model (e.g., deep bidirectional LSTM)
  2. Initialize network to output targets + “break” character
  3. Pass training instances through network to obtain probability distribution over labels/symbols
  4. Construct graph/table of “viable” alignments
  5. Compute probabilities of alignments using Forward/Backward Algorithm (see Lecture)
  6. Compute Expected Divergence over all alignments (see Lecture)
  7. Propagate gradients backward and update parameters

36 of 60

Model

x(T-1)

x(T)

x(0)

x(1)

y(0)

Model

y(1)

Model

y(T-1)

Model

y(T)

x(1)

Model

y(1)

x(1)

Model

y(1)

Someone says “zoo”

We want to transcribe “zoo” at these time steps

Z

O

O

?

?

?

?

?

?

Inputs

Probabilities

Target

All time-aligned targets for Target

?

?

?

?

?

?

?

?

?

?

?

?

Use CTC to get time-aligned targets

(Training)

37 of 60

Inference

38 of 60

Q: What’s different at inference?

39 of 60

Q: What’s different at inference?

A: No target

A: No sequence “rules”

40 of 60

Q: What’s different at inference?

A: No target

A: No sequence “rules”

The “tree” is about to get very large :/ (gulp)

41 of 60

What are our options?

  • Greedy Search

  • Exhaustive Search (not really possible)

  • Beam Search

42 of 60

Inference

Greedy Search

  • Greedy Search is an easy-to-implement option for CTC decoding at inference time
  • Greedy Search simply selects the most probable time step at each time-step
  • Although this method is easy to implement and fast, it has the disadvantage of missing out on high-probability (score) overall paths due to it’s greedy search

43 of 60

Y_probs

Greedy Search

DECODED STRING

?

44 of 60

Y_probs

Greedy Search

A

45 of 60

Y_probs

Greedy Search

A

B

46 of 60

Y_probs

Greedy Search

A

B

A

47 of 60

Y_probs

Greedy Search

A

B

A

B

48 of 60

Y_probs

Greedy Search

DECODED STRING

A B A B

0.391*0.341*0.402*0.358 = 0.0191884642

49 of 60

Inference

Beam Search

  • To have better decoding than Greedy Search, but keep the method feasible at the same time, we can choose to “explore” top-k paths at each time-step
  • By exploring more than one most-probable output sequences at each time-step, we will reach a sub-optimal path that is likely to be better than the Greedy Search strategy
  • By limiting our exploration options to a specific Beam Width - k, we also ensure that the computation is tractable, as opposed to the Exhaustive Search strategy

50 of 60

Y_probs

Beam Search

DECODED STRING

?

Seq Len

4

Symbol set

{‘-’, ‘A’, ‘B’, ‘C’}

Beam Width

3

Parameters

51 of 60

Scores

Beam Search

Possible Paths

Calculate Score

Score

-

1*0.140

0.140

A

1*0.391

0.391

B

1*0.197

0.197

C

1*0.271

0.271

Best Paths

Score

_

_

_

_

_

_

Old

52 of 60

Scores

Beam Search

Possible Paths

Calculate Score

Score

-

1*0.140

0.140

A

1*0.391

0.391

B

1*0.197

0.197

C

1*0.271

0.271

Best Paths

Score

_

_

_

_

_

_

Best Paths

Score

A

0.391

B

0.197

C

0.271

Old

New

53 of 60

Scores

Beam Search

Possible Paths

Calculate

Score

Score

A-

0.391*0.257

0.10048700

B-

0.197*0.257

0.05062900

C-

0.271*0.257

0.06964700

AA -> A

0.391*0.096

0.03753600

AB

0.391*0.341

0.13333100

AC

0.391*0.305

0.11925500

BA

0.197*0.096

0.01891200

BB -> B

0.197*0.341

0.06717700

BC

0.197*0.305

0.06008500

CA

0.271*0.096

0.02601600

CB

0.271*0.341

0.09241100

CC -> C

0.271*0.305

0.08265500

Best Paths

Score

A

0.391

B

0.197

C

0.271

Old

54 of 60

Scores

Beam Search

Possible Paths

Calculate Score

Score

A-

0.391*0.257

0.10048700

B-

0.197*0.257

0.05062900

C-

0.271*0.257

0.06964700

AA -> A

0.391*0.096

0.03753600

AB

0.391*0.341

0.13333100

AC

0.391*0.305

0.11925500

BA

0.197*0.096

0.01891200

BB -> B

0.197*0.341

0.06717700

BC

0.197*0.305

0.06008500

CA

0.271*0.096

0.02601600

CB

0.271*0.341

0.09241100

CC -> C

0.271*0.305

0.08265500

Best Paths

Score

A

0.391

B

0.197

C

0.271

Best Paths

Score

A-

0.10048700

AB

0.13333100

AC

0.11925500

Old

New

55 of 60

Scores

Beam Search

Possible Paths

Calculate Score

Score

A–-

0.100487*0.248

0.0249207760

AB-

0.133331*0.248

0.0330660880

AC-

0.119255*0.248

0.0295752400

A-A -> AA

0.100487*0.402

0.0403957740

ABA

0.133331*0.402

0.0535990620

ACA

0.119255*0.402

0.0479405100

A-B -> AB

0.100487*0.267

0.0268300290

ABB -> AB

0.133331*0.267

0.0355993770

ACB

0.119255*0.267

0.0318410850

A-C -> AC

0.100487*0.083

0.0083404210

ABC

0.133331*0.083

0.0110664730

ACC -> AC

0.119255*0.083

0.0098981650

Best Paths

Score

A-

0.10048700

AB

0.13333100

AC

0.11925500

Old

56 of 60

Scores

Beam Search

Possible Paths

Calculate Score

Score

A–-

0.100487*0.248

0.0249207760

AB-

0.133331*0.248

0.0330660880

AC-

0.119255*0.248

0.0295752400

A-A -> AA

0.100487*0.402

0.0403957740

ABA

0.133331*0.402

0.0535990620

ACA

0.119255*0.402

0.0479405100

A-B -> AB

0.100487*0.267

0.0268300290

ABB -> AB

0.133331*0.267

0.0355993770

ACB

0.119255*0.267

0.0318410850

A-C -> AC

0.100487*0.083

0.0083404210

ABC

0.133331*0.083

0.0110664730

ACC -> AC

0.119255*0.083

0.0098981650

Best Paths

Score

A-

0.10048700

AB

0.13333100

AC

0.11925500

Best Paths

Score

ABA

0.0535990620

ACA

0.0479405100

AB

0.0624294060

Old

New

57 of 60

Scores

Beam Search

Possible Paths

Calculate

Score

Score

ABA- -> ABA

0.0535990620*0.149

0.0079862602380

ACA- -> ACA

0.0479405100*0.149

0.0071431359900

AB- -> AB

0.0624294060*0.149

0.0093019814940

ABAA -> ABA

0.0535990620*0.336

0.0180092848320

ACAA -> ACA

0.0479405100*0.336

0.0161080113600

ABA

0.0624294060*0.336

0.0209762804160

ABAB

0.0535990620*0.358

0.0191884641960

ACAB

0.0479405100*0.358

0.0171627025800

ABB -> AB

0.0624294060*0.358

0.0223497273480

ABAC

0.0535990620*0.157

0.0084150527340

ACAC

0.0479405100*0.157

0.0075266600700

ABC

0.0624294060*0.157

0.0098014167420

Best Paths

Score

ABA

0.0535990620

ACA

0.0479405100

AB

0.0624294060

Old

58 of 60

Scores

Beam Search

Possible Paths

Calculate

Score

Score

ABA- -> ABA

0.0535990620*0.149

0.0079862602380

ACA- -> ACA

0.0479405100*0.149

0.0071431359900

AB- -> AB

0.0624294060*0.149

0.0093019814940

ABAA -> ABA

0.0535990620*0.336

0.0180092848320

ACAA -> ACA

0.0479405100*0.336

0.0161080113600

ABA

0.0624294060*0.336

0.0209762804160

ABAB

0.0535990620*0.358

0.0191884641960

ACAB

0.0479405100*0.358

0.0171627025800

ABB -> AB

0.0624294060*0.358

0.0223497273480

ABAC

0.0535990620*0.157

0.0084150527340

ACAC

0.0479405100*0.157

0.0075266600700

ABC

0.0624294060*0.157

0.0098014167420

Best Paths

Score

ABA

0.0535990620

ACA

0.0479405100

AB

0.0624294060

Best Paths

Score

ABA

0.04697182548

AB

0.031651708842

ACA

0.02325114735

Old

New

59 of 60

Beam Search

60 of 60

Beam Search