6.1120: Parsing
Top-Down Parsing
Example
Boolean Term()
if (token = Int n) token = NextToken(); return(TermPrime())
else return(false)
Boolean TermPrime()
if (token = *)
token = NextToken();
if (token = Int n) token = NextToken(); return(TermPrime())
else return(false)
else if (token = /)
token = NextToken();
if (token = Int n) token = NextToken(); return(TermPrime())
else return(false)
else return(true)
Term → Int Term’
Term’ → * Int Term’
Term’ → / Int Term’
Term’ → ε
Example: the skeleton
Generative Approach: �<2-1>+1
Start
Expr
Expr Op Expr
Open Expr Close Op Expr
…
Op = +|-|*|/
Int = [0-9] [0-9]*
Open = <
Close = >
1) Start → Expr
2) Expr → Expr Op Expr
3) Expr → Int
4) Expr → Open Expr Close
Which production to choose?
Which nonterminal �to expand?
Top-Down Parsing Algorithm So Far
Fundamentally, there are two design choices:
Yesterday: fast overview of everything
Today: systematic thinking for predictive parsing.
Predictive Parsing + Hand Coding = Recursive Descent Parser
Example
Boolean Term()
if (token = Int n) token = NextToken(); return(TermPrime())
else return(false)
Boolean TermPrime()
if (token = *)
token = NextToken();
if (token = Int n) token = NextToken(); return(TermPrime())
else return(false)
else if (token = /)
token = NextToken();
if (token = Int n) token = NextToken(); return(TermPrime())
else return(false)
else return(true)
Term → Int Term’
Term’ → * Int Term’
Term’ → / Int Term’
Term’ → ε
Nonterminals
NT → NT1 α1
NT → NT2 α 2
First(β)
1) T∈First(T )
2) First(S ) ⊆ First(S β)
3) NT derives ε implies First(β) ⊆ First(NT β)
4) NT → S β implies First(S β) ⊆ First(NT )
First(β)
1) T∈First(T )
2) First(S ) ⊆ First(S β)
3) NT derives ε implies First(β) ⊆ First(NT β)
4) NT → S β implies First(S β) ⊆ First(NT )
NT derives ε
Fixed Point Algorithm for Derives ε
for all nonterminals NT
set NT derives ε to be false
for all productions of the form NT → ε
set NT derives ε to be true
while (some NT derives ε changed in last iteration)
for all productions of the form NT → NT1 ... NTn
if (for all 1≤i ≤n NTi derives ε)
set NT derives ε to be true
First(β)
1) T∈First(T )
2) First(S ) ⊆ First(S β)
3) NT derives ε implies First(β) ⊆ First(NT β)
4) NT → S β implies First(S β) ⊆ First(NT )
Rules + Request Generate System of Subset Inclusion Constraints
Grammar
Term’ → * Int Term’
Term’ → / Int Term’
Term’ → ε
Request: What is First(Term’ )?
Constraints
First(* Int Term’ ) ⊆ First(Term’ )
First(/ Int Term’ ) ⊆ First(Term’ )
First(*) ⊆ First(* Int Term’ )
First(/) ⊆ First(/ Int Term’ )
*∈First(*)
/ ∈First(/)
Rules
1) T∈First(T )
2) First(S) ⊆ First(S β)
3) NT derives ε implies First(β) ⊆ First(NT β)
4) NT → S β implies
First(S β) ⊆ First(NT )
Constraint Propagation Algorithm
Constraints
First(* Int Term’ ) ⊆ First(Term’ )
First(/ Int Term’ ) ⊆ First(Term’ )
First(*) ⊆ First(* Int Term’ )
First(/) ⊆ First(/ Int Term’ )
*∈First(*)
/ ∈First(/)
Solution
First(Term’ ) = {}
First(* Int Term’ ) = {}
First(/ Int Term’ ) = {}
First(*) = {}
First(/) = {}
Initialize Sets to {}
Propagate Constraints Until Fixed Point
Constraint Propagation Algorithm
Constraints
First(* Num Term’ ) ⊆ First(Term’ )
First(/ Num Term’ ) ⊆ First(Term’ )
First(*) ⊆ First(* Num Term’ )
First(/) ⊆ First(/ Num Term’ )
*∈First(*)
/ ∈First(/)
Solution
First(Term’ ) = {}
First(* Num Term’ ) = {}
First(/ Num Term’ ) = {}
First(*) = {}
First(/) = {}
Grammar
Term’ → * Int Term’
Term’ → / Int Term’
Term’ → ε
Constraint Propagation Algorithm
Constraints
First(* Num Term’ ) ⊆ First(Term’ )
First(/ Num Term’ ) ⊆ First(Term’ )
First(*) ⊆ First(* Num Term’ )
First(/) ⊆ First(/ Num Term’ )
*∈First(*)
/ ∈First(/)
Solution
First(Term’ ) = {}
First(* Num Term’ ) = {}
First(/ Num Term’ ) = {}
First(*) = {*}
First(/) = {/}
Grammar
Term’ → * Int Term’
Term’ → / Int Term’
Term’ → ε
Constraint Propagation Algorithm
Solution
First(Term’ ) = {}
First(* Num Term’ ) = {*}
First(/ Num Term’ ) = {/}
First(*) = {*}
First(/) = {/}
Constraints
First(* Num Term’ ) ⊆ First(Term’ )
First(/ Num Term’ ) ⊆ First(Term’ )
First(*) ⊆ First(* Num Term’ )
First(/) ⊆ First(/ Num Term’ )
*∈First(*)
/ ∈First(/)
Grammar
Term’ → * Int Term’
Term’ → / Int Term’
Term’ → ε
Constraint Propagation Algorithm
Solution
First(Term’ ) = {*,/}
First(* Num Term’ ) = {*}
First(/ Num Term’ ) = {/}
First(*) = {*}
First(/) = {/}
Constraints
First(* Num Term’ ) ⊆ First(Term’ )
First(/ Num Term’ ) ⊆ First(Term’ )
First(*) ⊆ First(* Num Term’ )
First(/) ⊆ First(/ Num Term’ )
*∈First(*)
/ ∈First(/)
Grammar
Term’ → * Int Term’
Term’ → / Int Term’
Term’ → ε
Predictive Parsing + Hand Coding = Recursive Descent Parser
Example
Boolean Term()
if (token = Int n) token = NextToken(); return(TermPrime())
else return(false)
Boolean TermPrime()
if (token = *)
token = NextToken();
if (token = Int n) token = NextToken(); return(TermPrime())
else return(false)
else if (token = /)
token = NextToken();
if (token = Int n) token = NextToken(); return(TermPrime())
else return(false)
else return(true)
Term → Int Term’
Term’ → * Int Term’
Term’ → / Int Term’
Term’ → ε
First(* Num Term’ ) = {*}
First(/ Num Term’ ) = {/}
Act 5
Multiple Productions With Same Prefix in RHS
NT → if then
NT → if then else
Solution: Left Factor the Grammar
NT → if then NT’
NT’ → else
NT’ → ε
Act 6
How far we have come
Unambiguous/Precedence
Term → Term * Int
Term → Term / Int
Term → Int
Eliminate Left Recursion�
Term → Int Term’
Term’ → * Int Term’
Term’ → / Int Term’
Term’ → ε
Original Grammar Fragment�
Expr → Expr Op Expr
Expr → Int
Op = (* | /)
Building A Parse Tree
Term
Int
2
Term’
Int
3
*
Term’
Int
4
*
Term’
ε
Parse Tree
Building Parse Tree In Example
Term()
if (token = Int n)
oldToken = token; token = NextToken();
node = TermPrime();
if (node == NULL) return oldToken;
else return(new TermNode(oldToken, node);
else throw SyntaxError
TermPrime()
if (token = *) || (token = /)
first = token; next = NextToken();
if (next = Int n)
token = NextToken();
return(new TermPrimeNode(first, next, TermPrime())
else throw SyntaxError
else return(NULL)
Why Use Hand-Coded Parser?
Bottom Line
Summary
Backup
How far we have come
Unambiguous/Precedence
Term → Term * Int
Term → Term / Int
Term → Int
Eliminate Left Recursion�
Term → Int Term’
Term’ → * Int Term’
Term’ → / Int Term’
Term’ → ε
Original Grammar Fragment�
Expr → Expr Op Expr
Expr → Int
Op = (* | /)
Parse Tree for 2*3*4
Term
Int
2
Term’
Int
3
*
Term’
Int
4
*
Term’
ε
Term
Int
3
*
Term
Int
4
*
Int
2
Concrete
Parse Tree
Desired
Abstract
Parse Tree
Direct Generation of Abstract Tree
Term
Int
3
*
Term
Int
4
*
root
incomplete
Missing left child to be filled in by caller
Code for Term
Term()
if (token = Int n)
leftmostInt = token; token = NextToken();
(root, incomplete) = TermPrime();
if (root == NULL) return leftmostInt;
incomplete.leftChild = leftmostInt;
return root;
else throw SyntaxError
Int
2
token
2*3*4
Input to parse
Code for Term
Term()
if (token = Int n)
leftmostInt = token; token = NextToken();
(root, incomplete) = TermPrime();
if (root == NULL) return leftmostInt;
incomplete.leftChild = leftmostInt;
return root;
else throw SyntaxError
Int
2
token
2*3*4
Input to parse
Code for Term
Term()
if (token = Int n)
leftmostInt = token; token = NextToken();
(root, incomplete) = TermPrime();
if (root == NULL) return leftmostInt;
incomplete.leftChild = leftmostInt;
return root;
else throw SyntaxError
Int
2
token
2*3*4
Input to parse
Code for Term
Term()
if (token = Int n)
leftmostInt = token; token = NextToken();
(root, incomplete) = TermPrime();
if (root == NULL) return leftmostInt;
incomplete.leftChild = leftmostInt;
return root;
else throw SyntaxError
Term
Int
3
*
Term
Int
4
*
Int
2
root
incomplete
leftmostInt
2*3*4
Input to parse
Code for Term
Term()
if (token = Int n)
leftmostInt = token; token = NextToken();
(root, incomplete) = TermPrime();
if (root == NULL) return leftmostInt;
incomplete.leftChild = leftmostInt;
return root;
else throw SyntaxError
Term
Int
3
*
Term
Int
4
*
Int
2
root
incomplete
leftmostInt
2*3*4
Input to parse
Code for Term
Term()
if (token = Int n)
leftmostInt = token; token = NextToken();
(root, incomplete) = TermPrime();
if (root == NULL) return leftmostInt;
incomplete.leftChild = leftmostInt;
return root;
else throw SyntaxError
Term
Int
3
*
Term
Int
4
*
Int
2
root
incomplete
leftmostInt
2*3*4
Input to parse
Code for TermPrime
TermPrime()
if (token = *) || (token = /)
op = token; next = NextToken();
if (next = Int n)
token = NextToken();
(root, incomplete) = TermPrime();
if (root == NULL)
root = new ExprNode(NULL, op, next);
return (root, root);
else newChild = new ExprNode(NULL, op, next);
incomplete.leftChild = newChild;
return(root, newChild);
else throw SyntaxError
else return(NULL,NULL)
Missing left child to be filled in by caller