1 of 58

6.1120: Parsing

Top-Down Parsing

2 of 58

Martin Rinard

Laboratory for Computer Science

Massachusetts Institute of Technology

MIT 6.1100 (6.035) �Top-Down Parsing

3 of 58

Orientation

  • Language specification
    • Lexical structure – regular expressions
    • Syntactic structure – grammar
  • This Lecture - recursive descent parsers
    • Code parser as set of mutually recursive procedures
    • Structure of program matches structure of grammar

4 of 58

Starting Point

  • Assume lexical analysis has produced a sequence of tokens
    • Each token has a type and value
    • Types correspond to terminals
    • Values to contents of token read in
  • Examples
    • Int 549 – integer token with value 549 read in
    • if - if keyword, no need for a value
    • AddOp + - add operator, value +

5 of 58

Recognition Approach:�<2-1>+1

Open Int Op Int Close Op Int

Open Expr Op Int Close Op Int

Open Expr Op Expr Close Op Int

Open Expr Close Op Int

Expr Op Int

Expr Op Expr

Expr

Start

Op = +|-|*|/

Int = [0-9] [0-9]*

Open = <

Close = >

1) StartExpr

2) ExprExpr Op Expr

3) Expr → Int

4) Expr → Open Expr Close

6 of 58

Generative Approach: �<2-1>+1

Start

Expr

Expr Op Expr

Open Expr Close Op Expr

Open Expr Op Expr Close Op Expr

Open Int Op Expr Close Op Expr

Open Int Op Expr Close Op Int

Open Int Op Int Close Op Int

Op = +|-|*|/

Int = [0-9] [0-9]*

Open = <

Close = >

1) StartExpr

2) ExprExpr Op Expr

3) Expr → Int

4) Expr → Open Expr Close

7 of 58

Grammar for Today

StartExpr

ExprExpr + Term

ExprExpr - Term

ExprTerm

TermTerm * Int

TermTerm / Int

Term → Int

  • Set of tokens is { +, -, *, /, Int }, where Int = [0-9][0-9]*
  • For convenience, may represent each Int n token by n

8 of 58

Goal: Recursive Descent Parser �(Generative/Top-Down Approach

Term()

if (token = Int n)

oldToken = token; token = NextToken();

node = TermPrime();

if (node == NULL) return oldToken;

else return(new TermNode(oldToken, node);

else throw SyntaxError

TermPrime()

if (token = *) || (token = /)

first = token; next = NextToken();

if (next = Int n)

token = NextToken();

return(new TermPrimeNode(first, next, TermPrime())

else throw SyntaxError

else return(NULL)

9 of 58

Basic Approach

  • Start with Start symbol
  • Build a leftmost derivation
    • If leftmost symbol is nonterminal, choose a production and apply it
    • If leftmost symbol is terminal, match against input
    • If all terminals match, have found a parse!
    • Key: find correct productions for nonterminals

10 of 58

Graphical Illustration of Leftmost Derivation

Apply Production Here

NT1 T1 T2 T3 NT2 NT3

Not Here

Sentential Form

11 of 58

Start

Parse

Tree

Sentential Form

Remaining Input

2-2*2

Start

Current Position in Parse Tree

Parsing Example

12 of 58

Applied Production

StartExpr

Start

Parse

Tree

Sentential Form

Remaining Input

2-2*2

Expr

Expr

Current Position in Parse Tree

Parsing Example

13 of 58

Applied Production

ExprExpr - Term

Parse

Tree

Sentential Form

Remaining Input

2-2*2

Expr - Term

Start

Expr

Term

Expr

-

Parsing Example

ExprExpr + Term

ExprExpr - Term

ExprTerm

14 of 58

Applied Production

ExprTerm

Start

Parse

Tree

Sentential Form

Remaining Input

2-2*2

Term - Term

Expr

Term

Expr

-

Term

Parsing Example

ExprExpr + Term

ExprExpr - Term

ExprTerm

15 of 58

Applied Production

Term → Int

Start

Parse

Tree

Sentential Form

Remaining Input

2-2*2

Expr

Term

Expr

-

Term

Int

Int - Term

Parsing Example

16 of 58

Start

Parse

Tree

Sentential Form

Remaining Input

2-2*2

2 - Term

Expr

Term

Expr

-

Term

Match

Input

Token!

Int 2

Parsing Example

17 of 58

Start

Parse

Tree

Sentential Form

Remaining Input

-2*2

2 - Term

Expr

Term

Expr

-

Term

Match

Input

Token!

Int 2

Parsing Example

18 of 58

Start

Parse

Tree

Sentential Form

Remaining Input

2*2

2 - Term

Expr

Term

Expr

-

Term

Match

Input

Token!

Int 2

Parsing Example

19 of 58

Applied Production

TermTerm * Int

Start

Parse

Tree

Sentential Form

Remaining Input

2*2

2 - Term*Int

Expr

Term

Expr

-

Term

Term

Int

*

Int 2

Parsing Example

20 of 58

Applied Production

Term → Int

Start

Parse

Tree

Sentential Form

Remaining Input

2*2

2 - Int * Int

Expr

Term

Expr

-

Term

Term

Int

*

Int 2

Int

Parsing Example

21 of 58

Match

Input

Token!

Start

Parse

Tree

Sentential Form

Remaining Input

2*2

2 - 2* Int

Expr

Term

Expr

-

Term

Term

Int

*

Int 2

Int 2

Parsing Example

22 of 58

Match

Input

Token!

Start

Parse

Tree

Sentential Form

Remaining Input

*2

2 - 2* Int

Expr

Term

Expr

-

Term

Term

Int

*

Int 2

Int 2

Parsing Example

23 of 58

Match

Input

Token!

Start

Parse

Tree

Sentential Form

Remaining Input

2

2 - 2* Int

Expr

Term

Expr

-

Term

Term

Int

*

Int 2

Int 2

Parsing Example

24 of 58

Parsing Example

Start

Parse

Tree

Sentential Form

Remaining Input

2

2 - 2*2

Expr

Term

Expr

-

Term

Term

Int 2

*

Int 2

Int 2

Parse

Complete!

25 of 58

Summary

  • Three Actions (Mechanisms)
    • Apply production to expand current nonterminal in parse tree
    • Match current terminal (consuming input)
    • Accept the parse as correct
  • Parser generates preorder traversal of parse tree
    • visit parents before children
    • visit siblings from left to right

26 of 58

Act 1

27 of 58

Policy Problem

  • Which production to use for each nonterminal?
  • Classical Separation of Policy and Mechanism
  • One Approach: Backtracking
    • Treat it as a search problem
    • At each choice point, try next alternative
    • If it is clear that current try fails, go back to previous choice and try something different
  • General technique for searching
  • Used a lot in classical AI and natural language processing (parsing, speech recognition)

28 of 58

Backtracking Example

Start

Parse

Tree

Sentential Form

Remaining Input

2-2*2

Start

29 of 58

Applied Production

StartExpr

Backtracking Example

Start

Parse

Tree

Sentential Form

Remaining Input

2-2*2

Expr

Expr

30 of 58

Applied Production

Expr Expr + Term

Backtracking Example

Start

Parse

Tree

Sentential Form

Remaining Input

2-2*2

Expr + Term

Expr

Term

Expr

+

31 of 58

Applied Production

Expr Term

Backtracking Example

Start

Parse

Tree

Sentential Form

Remaining Input

2-2*2

Term + Term

Expr

Term

Expr

+

Term

32 of 58

Applied Production

Term Int

Backtracking Example

Start

Parse

Tree

Sentential Form

Remaining Input

2-2*2

Int + Term

Expr

Term

Expr

+

Term

Int

Match

Input

Token!

33 of 58

Applied Production

Term Int

Backtracking Example

Start

Parse

Tree

Sentential Form

Remaining Input

-2*2

2 + Term

Expr

Term

Expr

+

Term

Int 2

Cant

Match

Input

Token!

34 of 58

Applied Production

StartExpr

Backtracking Example

Start

Parse

Tree

Sentential Form

Remaining Input

2-2*2

Expr

Expr

So

Backtrack!

35 of 58

Applied Production

Expr Expr - Term

Backtracking Example

Start

Parse

Tree

Sentential Form

Remaining Input

2-2*2

Expr - Term

Expr

Term

Expr

-

36 of 58

Term

Applied Production

Expr Term

Backtracking Example

Start

Parse

Tree

Sentential Form

Remaining Input

2-2*2

Term - Term

Expr

Term

Expr

-

37 of 58

Term

Applied Production

Term Int

Backtracking Example

Start

Parse

Tree

Sentential Form

Remaining Input

2-2*2

Int - Term

Expr

Term

Expr

-

Int

38 of 58

Match

Input

Token!

Term

Backtracking Example

Start

Parse

Tree

Sentential Form

Remaining Input

-2*2

2 - Term

Expr

Term

Expr

-

Int 2

39 of 58

Match

Input

Token!

Term

Backtracking Example

Start

Parse

Tree

Sentential Form

Remaining Input

2*2

2 - Term

Expr

Term

Expr

-

Int 2

40 of 58

Act 2

41 of 58

Any one see a problem?

  • Example Production: TermTerm*Num
  • Potential parsing steps:

Term

Num

*

Term

Term

Term

Num

*

Term

Term

Num

*

42 of 58

Left Recursion + Top-Down Parsing = Infinite Loop

  • Example Production: TermTerm*Num
  • Potential parsing steps:

Term

Num

*

Term

Term

Term

Num

*

Term

Term

Num

*

43 of 58

General Search Issues

  • Three components
    • Search space (parse trees)
    • Search algorithm (parsing algorithm)
    • Goal to find (parse tree for input program)
  • Would like to (but cant always) ensure that
    • Find goal (hopefully quickly) if it exists
    • Search terminates if it does not
  • Handled in various ways in various contexts
    • Finite search space makes it easy
    • Exploration strategies for infinite search space
    • Sometimes one goal more important (model checking)
  • For parsing, hack grammar to remove left recursion

44 of 58

Eliminating Left Recursion

  • Start with productions of form
    • A →A α
    • A → β
    • α, β sequences of terminals and nonterminals that do not start with A
  • Repeated application of A →A α

builds parse tree like this:

A

α

A

α

A

α

β

45 of 58

Eliminating Left Recursion

  • Replacement productions
    • A →A α A → β R R is a new nonterminal
    • A → β R → α R
    • R → ε

A

α

A

α

β

A

β

R

α

R

α

R

ε

Old Parse Tree

New Parse Tree

46 of 58

Hacked Grammar

Original Grammar Fragment

TermTerm * Int

TermTerm / Int

Term → Int

New Grammar Fragment

Term → Int Term

Term* Int Term

Term/ Int Term

Term ε

47 of 58

Term

Int

*

Term

Int

*

Int

Term

Int

Term

Int

*

Term

Int

*

Term

ε

Parse Tree Comparisons

Original Grammar

New Grammar

48 of 58

Eliminating Left Recursion

  • Changes search space exploration algorithm
    • Eliminates direct infinite recursion
    • But grammar less intuitive
  • Sets things up for predictive parsing

49 of 58

Act 3

50 of 58

Predictive Parsing

  • Alternative to backtracking
  • Useful for programming languages, which can be designed to make parsing easier
  • Basic idea
    • Look ahead in input stream
    • Decide which production to apply based on next tokens in input stream
    • We will use one token of lookahead
  • (Getting closer in form to our recognition approach – but not quite…see 6.1100!)

51 of 58

Predictive Parsing Example Grammar

StartExpr

ExprTerm Expr’

Expr’ → + Term Expr’

Expr’ → - Term Expr’

Expr’ ε

Term → Int Term’

Term’* Int Term’

Term’/ Int Term’

Term’ ε

52 of 58

Choice Points

  • Assume Term is current position in parse tree
  • Have three possible productions to apply

Term* Int Term

Term/ Int Term

Term → ε

  • Use next token to decide
    • If next token is *, apply Term* Int Term
    • If next token is /, apply Term/ Int Term
    • Otherwise, apply Term → ε

53 of 58

Predictive Parsing Example

Start

Parse

Tree

Sentential Form

Remaining Input

*2

2 - 2 Term’

Expr

Term

Term

Int 2

Term’

ε

Expr’

Term

Int 2

Term’

-

Term → Int Term’

Term’* Int Term’

Term’/ Int Term’

Term’ ε

54 of 58

Predictive Parsing Example

Start

Parse

Tree

Sentential Form

Remaining Input

*2

2 - 2 Term’

Expr

Term

Term

Int 2

Term’

ε

Expr’

Int 2

Term’

-

Applied Production

Term’ * Int Term’

*

Term’

Int

55 of 58

Predictive Parsing + Hand Coding = Recursive Descent Parser

  • One procedure per nonterminal NT
    • Productions NT → β1 , …, NT → βn
    • Procedure examines the current input symbol T to determine which production to apply
      • If T∈Firstk) (we still discuss first next lecture)
      • Apply production k
      • Consume terminals in βk (check for correct terminal)
      • Recursively call procedures for nonterminals in βk
    • Current input symbol stored in global variable token
  • Procedures return the section of the parse tree for the part of the string it parsed

56 of 58

Example

Boolean Term()

if (token = Int n) token = NextToken(); return(TermPrime())

else return(false)

Boolean TermPrime()

if (token = *)

token = NextToken();

if (token = Int n) token = NextToken(); return(TermPrime())

else return(false)

else if (token = /)

token = NextToken();

if (token = Int n) token = NextToken(); return(TermPrime())

else return(false)

else return(true)

Term → Int Term

Term* Int Term

Term/ Int Term

Term → ε

57 of 58

Next time: Parse

Term()

if (token = Int n)

oldToken = token; token = NextToken();

node = TermPrime();

if (node == NULL) return oldToken;

else return(new TermNode(oldToken, node);

else throw SyntaxError

TermPrime()

if (token = *) || (token = /)

first = token; next = NextToken();

if (next = Int n)

token = NextToken();

return(new TermPrimeNode(first, next, TermPrime())

else throw SyntaxError

else return(NULL)

Term → Int Term

Term* Int Term

Term/ Int Term

Term → ε

58 of 58

Building A Parse Tree

  • Have each procedure return the section of the parse tree for the part of the string it parsed
  • Use exceptions to make code structure clean

Term

Int

2

Term

Int

3

*

Term

Int

4

*

Term

ε

Parse Tree