Compiler course
Chapter 4
Syntax Analysis
Outline
The role of parser
Lexical Analyzer
Parser
Source
program
token
getNext
Token
Symbol
table
Parse tree
Rest of Front End
Intermediate
representation
Uses of grammars
E -> E + T | T
T -> T * F | F
F -> (E) | id
E -> TE’
E’ -> +TE’ | Ɛ
T -> FT’
T’ -> *FT’ | Ɛ
F -> (E) | id
Error handling
Error-recover strategies
Context free grammars
expression -> expression + term
expression -> expression – term
expression -> term
term -> term * factor
term -> term / factor
term -> factor
factor -> (expression)
factor -> id
Derivations
Parse trees
Ambiguity
Elimination of ambiguity
Elimination of ambiguity (cont.)
Elimination of left recursion
+
Left recursion elimination (cont.)
Left factoring
Left factoring (cont.)
Top Down Parsing
Introduction
E -> TE’
E’ -> +TE’ | Ɛ
T -> FT’
T’ -> *FT’ | Ɛ
F -> (E) | id
E
lm
E
T
E’
lm
E
T
E’
F
T’
lm
E
T
E’
F
T’
id
lm
E
T
E’
F
T’
id
Ɛ
lm
E
T
E’
F
T’
id
Ɛ
+
T
E’
Recursive descent parsing
void A() {
choose an A-production, A->X1X2..Xk
for (i=1 to k) {
if (Xi is a nonterminal
call procedure Xi();
else if (Xi equals the current input symbol a)
advance the input to the next symbol;
else /* an error has occurred */
}
}
Recursive descent parsing (cont)
Example
S->cAd
A->ab | a
Input: cad
S
c
A
d
S
c
A
d
a
b
S
c
A
d
a
First and Follow
*
*
Computing First
*
*
Computing follow
LL(1) Grammars
*
Construction of predictive parsing table
Example
E -> TE’
E’ -> +TE’ | Ɛ
T -> FT’
T’ -> *FT’ | Ɛ
F -> (E) | id
F
T
E
E’
T’
First
Follow
{(,id}
{(,id}
{(,id}
{+,ɛ}
{*,ɛ}
{+, *, ), $}
{+, ), $}
{+, ), $}
{), $}
{), $}
E
E’
T
T’
F
Non -
terminal
Input Symbol
id
+
*
(
)
$
E -> TE’
E -> TE’
E’ -> +TE’
E’ -> Ɛ
E’ -> Ɛ
T -> FT’
T -> FT’
T’ -> *FT’
T’ -> Ɛ
T’ -> Ɛ
T’ -> Ɛ
F -> (E)
F -> id
Another example
S -> iEtSS’ | a
S’ -> eS | Ɛ
E -> b
S
S’
E
Non -
terminal
Input Symbol
a
b
e
i
t
$
S -> a
S -> iEtSS’
S’ -> Ɛ
S’ -> eS
S’ -> Ɛ
E -> b
Non-recursive predicting parsing
a
+
b
$
Predictive
parsing
program
output
Parsing
Table
M
stack
X
Y
Z
$
Predictive parsing algorithm
Set ip point to the first symbol of w;
Set X to the top stack symbol;
While (X<>$) { /* stack is not empty */
if (X is a) pop the stack and advance ip;
else if (X is a terminal) error();
else if (M[X,a] is an error entry) error();
else if (M[X,a] = X->Y1Y2..Yk) {
output the production X->Y1Y2..Yk;
pop the stack;
push Yk,…,Y2,Y1 on to the stack with Y1 on top;
}
set X to the top stack symbol;
}
Example
Matched
Stack
Input
Action
E$
id+id*id$
Error recovery in predictive parsing
Example
E
E’
T
T’
F
Non -
terminal
Input Symbol
id
+
*
(
)
$
E -> TE’
E -> TE’
E’ -> +TE’
E’ -> Ɛ
E’ -> Ɛ
T -> FT’
T -> FT’
T’ -> *FT’
T’ -> Ɛ
T’ -> Ɛ
T’ -> Ɛ
F -> (E)
F -> id
synch
synch
synch
synch
synch
synch
synch
synch
synch
Stack
Input
Action
E$
)id*+id$
Error, Skip )
E$
id*+id$
id is in First(E)
TE’$
id*+id$
FT’E’$
id*+id$
idT’E’$
id*+id$
T’E’$
*+id$
*FT’E’$
*+id$
+id$
FT’E’$
Error, M[F,+]=synch
+id$
T’E’$
F has been poped
Bottom-up Parsing
Introduction
E -> E + T | T
T -> T * F | F
F -> (E) | id
id
F * id
id*id
T * id
id
F
T * F
id
F
id
T * F
id
F
id
F
T * F
id
F
id
F
E
Shift-reduce parser
Handle pruning
Right sentential form
Handle
Reducing production
id*id
id
F->id
F*id
F
id
T->F
T*id
F->id
T*F
T*F
E->T*F
Shift reduce parsing
Stack Input
$ w$
Stack Input
$S $
Shift reduce parsing (cont.)
Stack
Input
Action
$
$id
id*id$
shift
*id$
reduce by F->id
$F
*id$
reduce by T->F
$T
*id$
shift
$T*
id$
shift
$T*id
$
reduce by F->id
$T*F
$
reduce by T->T*F
$T
$
reduce by E->T
$E
$
accept
Handle will appear on top of the stack
S
A
B
α
β
γ
y
z
Stack
Input
$αβγ
yz$
$αβB
yz$
$αβBy
z$
S
A
B
α
γ
y
z
x
Stack
Input
$αγ
xyz$
$αBxy
z$
Conflicts during shit reduce parsing
Stack
Input
else …$
… if expr then stmt
Reduce/reduce conflict
stmt -> id(parameter_list)
stmt -> expr:=expr
parameter_list->parameter_list, parameter
parameter_list->parameter
parameter->id
expr->id(expr_list)
expr->id
expr_list->expr_list, expr
expr_list->expr
Stack
Input
,id) …$
… id(id
LR Parsing
States of an LR parser
Constructing canonical LR(0) item sets
E’->E
E -> E + T | T
T -> T * F | F
F -> (E) | id
I0=closure({[E’->.E]}
E’->.E
E->.E+T
E->.T
T->.T*F
T->.F
F->.(E)
F->.id
Constructing canonical LR(0) item sets (cont.)
I0=closure({[E’->.E]}
E’->.E
E->.E+T
E->.T
T->.T*F
T->.F
F->.(E)
F->.id
E
I1
E’->E.
E->E.+T
I2
E’->T.
T->T.*F
T
I4
F->(.E)
E->.E+T
E->.T
T->.T*F
T->.F
F->.(E)
F->.id
(
Closure algorithm
SetOfItems CLOSURE(I) {
J=I;
repeat
for (each item A-> α.Bβ in J)
for (each prodcution B->γ of G)
if (B->.γ is not in J)
add B->.γ to J;
until no more items are added to J on one round;
return J;
GOTO algorithm
SetOfItems GOTO(I,X) {
J=empty;
if (A-> α.X β is in I)
add CLOSURE(A-> αX. β ) to J;
return J;
}
Canonical LR(0) items
Void items(G’) {
C= CLOSURE({[S’->.S]});
repeat
for (each set of items I in C)
for (each grammar symbol X)
if (GOTO(I,X) is not empty and not in C)
add GOTO(I,X) to C;
until no new set of items are added to C on a round;
}
Example
E’->E
E -> E + T | T
T -> T * F | F
F -> (E) | id
I0=closure({[E’->.E]}
E’->.E
E->.E+T
E->.T
T->.T*F
T->.F
F->.(E)
F->.id
E
I1
E’->E.
E->E.+T
I2
E’->T.
T->T.*F
T
I4
F->(.E)
E->.E+T
E->.T
T->.T*F
T->.F
F->.(E)
F->.id
(
I5
F->id.
id
I3
T>F.
+
I6
E->E+.T
T->.T*F
T->.F
F->.(E)
F->.id
*
I7
T->T*.F
F->.(E)
F->.id
E
I8
E->E.+T
F->(E.)
)
I11
F->(E).
I9
E->E+T.
T->T.*F
T
I10
T->T*F.
F
id
+
$
acc
Use of LR(0) automaton
Line | Stack | Symbols | Input | Action |
(1) | 0 | $ | id*id$ | Shift to 5 |
(2) | 05 | $id | *id$ | Reduce by F->id |
(3) | 03 | $F | *id$ | Reduce by T->F |
(4) | 02 | $T | *id$ | Shift to 7 |
(5) | 027 | $T* | id$ | Shift to 5 |
(6) | 0275 | $T*id | $ | Reduce by F->id |
(7) | 02710 | $T*F | $ | Reduce by T->T*F |
(8) | 02 | $T | $ | Reduce by E->T |
(9) | 01 | $E | $ | accept |
LR-Parsing model
a1 | … | ai | … | an | $ |
INPUT
LR Parsing Program
Sm |
Sm-1 |
… |
$ |
ACTION | GOTO |
Output
LR parsing algorithm
let a be the first symbol of w$;
while(1) { /*repeat forever */
let s be the state on top of the stack;
if (ACTION[s,a] = shift t) {
push t onto the stack;
let a be the next input symbol;
} else if (ACTION[s,a] = reduce A->β) {
pop |β| symbols of the stack;
let state t now be on top of the stack;
push GOTO[t,A] onto the stack;
output the production A->β;
} else if (ACTION[s,a]=accept) break; /* parsing is done */
else call error-recovery routine;
}
Example
(0) E’->E
(1) E -> E + T
(2) E-> T
(3) T -> T * F
(4) T-> F
(5) F -> (E)
(6) F->id
STATE | ACTON | GOTO | |||||||
| id | + | * | ( | ) | $ | E | T | F |
0 | S5 | | | S4 | | | 1 | 2 | 3 |
1 | | S6 | | | | Acc | | | |
2 | | R2 | S7 | | R2 | R2 | | | |
3 | | R4 | R7 | | R4 | R4 | | | |
4 | S5 | | | S4 | | | 8 | 2 | 3 |
5 | | R6 | R6 | | R6 | R6 | | | |
6 | S5 | | | S4 | | | | 9 | 3 |
7 | S5 | | | S4 | | | | | 10 |
8 | | S6 | | | S11 | | | | |
9 | | R1 | S7 | | R1 | R1 | | | |
10 | | R3 | R3 | | R3 | R3 | | | |
11 | | R5 | R5 | | R5 | R5 | | | |
id*id+id?
Line | Stack | Symbols | Input | Action |
(1) | 0 | | id*id+id$ | Shift to 5 |
(2) | 05 | id | *id+id$ | Reduce by F->id |
(3) | 03 | F | *id+id$ | Reduce by T->F |
(4) | 02 | T | *id+id$ | Shift to 7 |
(5) | 027 | T* | id+id$ | Shift to 5 |
(6) | 0275 | T*id | +id$ | Reduce by F->id |
(7) | 02710 | T*F | +id$ | Reduce by T->T*F |
(8) | 02 | T | +id$ | Reduce by E->T |
(9) | 01 | E | +id$ | Shift |
(10) | 016 | E+ | id$ | Shift |
(11) | 0165 | E+id | $ | Reduce by F->id |
(12) | 0163 | E+F | $ | Reduce by T->F |
(13) | 0169 | E+T` | $ | Reduce by E->E+T |
(14) | 01 | E | $ | accept |
Constructing SLR parsing table
Example grammar which is not SLR(1)
S -> L=R | R
L -> *R | id
R -> L
I0
S’->.S
S -> .L=R
S->.R
L -> .*R |
L->.id
R ->. L
I1
S’->S.
I2
S ->L.=R
R ->L.
I3
S ->R.
I4
L->*.R
R->.L
L->.*R
L->.id
I5
L -> id.
I6
S->L=.R
R->.L
L->.*R
L->.id
I7
L -> *R.
I8
R -> L.
I9
S -> L=R.
Action
=
2
Shift 6
Reduce R->L
More powerful LR parsers
Canonical LR(1) items
*
rm
S=>aaBab=>aaaBab
*
rm
Item [B->a.B,a] is valid for γ=aaa
and w=ab
Constructing LR(1) sets of items
SetOfItems Closure(I) {
repeat
for (each item [A->α.Bβ,a] in I)
for (each production B->γ in G’)
for (each terminal b in First(βa))
add [B->.γ, b] to set I;
until no more items are added to I;
return I;
}
SetOfItems Goto(I,X) {
initialize J to be the empty set;
for (each item [A->α.Xβ,a] in I)
add item [A->αX.β,a] to set J;
return closure(J);
}
void items(G’){
initialize C to Closure({[S’->.S,$]});
repeat
for (each set of items I in C)
for (each grammar symbol X)
if (Goto(I,X) is not empty and not in C)
add Goto(I,X) to C;
until no new sets of items are added to C;
}
Example
S’->S
S->CC
C->cC
C->d
Canonical LR(1) parsing table
Example
S’->S
S->CC
C->cC
C->d
LALR Parsing Table
I4
C->d. , c/d
I7
C->d. , $
I47
C->d. , c/d/$
Example of RR conflict in state merging
S’->S
S -> aAd | bBd | aBe | bAe
A -> c
B -> c
An easy but space-consuming LALR table construction
Compaction of LR parsing table
Using ambiguous grammars
E->E+E
E->E*E
E->(E)
E->id
I0: E’->.E
E->.E+E
E->.E*E
E->.(E)
E->.id
I1: E’->E.
E->E.+E
E->E.*E
I2: E->(.E)
E->.E+E
E->.E*E
E->.(E)
E->.id
I3: E->.id
I4: E->E+.E
E->.E+E
E->.E*E
E->.(E)
E->.id
I5: E->E*.E
E->(.E)
E->.E+E
E->.E*E
E->.(E)
E->.id
I6: E->(E.)
E->E.+E
E->E.*E
I7: E->E+E.
E->E.+E
E->E.*E
I8: E->E*E.
E->E.+E
E->E.*E
I9: E->(E).
STATE | ACTON | GOTO | |||||
| id | + | * | ( | ) | $ | E |
0 | S3 | | | S2 | | | 1 |
1 | | S4 | S5 | | | Acc | |
2 | S3 | | S2 | | | | 6 |
3 | | R4 | R4 | | R4 | R4 | |
4 | S3 | | | S2 | | | 7 |
5 | S3 | | | S2 | | | 8 |
6 | | S4 | S5 | | | | |
7 | | R1 | S5 | | R1 | R1 | |
8 | | R2 | R2 | | R2 | R2 | |
9 | | R3 | R3 | | R3 | R3 | |
Readings