1 of 24

3) Removing Empty Strings

Ensure only top-level symbol can be nullable

program ::= stmtSeq� stmtSeq ::= stmt | stmt ; stmtSeq� stmt ::= “” | assignment | whileStmt | blockStmt� blockStmt ::= { stmtSeq }� assignment ::= expr = expr� whileStmt ::= while (expr) stmt� expr ::= identifier

How to do it in this example?

2 of 24

3) Removing Empty Strings - Result

program ::= “” | stmtSeq � stmtSeq ::= stmt| stmt ; stmtSeq | � | ; stmtSeq | stmt ; | ;� stmt ::= assignment | whileStmt | blockStmt� blockStmt ::= { stmtSeq } | { }� assignment ::= expr = expr� whileStmt ::= while (expr) stmt� whileStmt ::= while (expr)� expr ::= identifier

3 of 24

3) Removing Empty Strings - Algorithm

  •  

 

4 of 24

3) Removing Empty Strings

  • Since stmtSeq is nullable, the rule� blockStmt ::= { stmtSeq }�gives� blockStmt ::= { stmtSeq } | { }
  • Since stmtSeq and stmt are nullable, the rule� stmtSeq ::= stmt | stmt ; stmtSeq�gives� stmtSeq ::= stmt | stmt ; stmtSeq � | ; stmtSeq | stmt ; | ;

5 of 24

4) Eliminating unit productions

  • Single production is of the form

X ::=Y

where X,Y are non-terminals

program ::= stmtSeq� stmtSeq ::= stmt � | stmt ; stmtSeq� stmt ::= assignment | whileStmt� assignment ::= expr = expr� whileStmt ::= while (expr) stmt

6 of 24

4) Unit Production Elimination Algorithm

  • If there is a unit production

X ::=Y put an edge (X,Y) into graph

  • If there is a path from X to Z in the graph, and there is rule Z ::= s1 s2 … sn then add rule

X ::= s1 s2 … sn

At the end, remove all unit productions.

7 of 24

4) Eliminate unit productions - Result

program ::= expr = expr | while (expr) stmt � | stmt ; stmtSeq� stmtSeq ::= expr = expr | while (expr) stmt � | stmt ; stmtSeq� stmt ::= expr = expr | while (expr) stmt � assignment ::= expr = expr� whileStmt ::= while (expr) stmt

8 of 24

5) Reducing Arity:�No more than 2 symbols on RHS�

stmt ::= while (expr) stmt

becomes

stmt ::= while stmt1� stmt1 ::= ( stmt2� stmt2 ::= expr stmt3� stmt3 ::= ) stmt

9 of 24

6) A non-terminal for each terminal

stmt ::= while (expr) stmt

becomes

stmt ::= Nwhile stmt1� stmt1 ::= N( stmt2� stmt2 ::= expr stmt3� stmt3 ::= N) stmt� Nwhile ::= while� N( ::= (� N) ::= )

10 of 24

Order of steps in conversion to CNF

    • remove unproductive symbols (optional)
    • remove unreachable symbols (optional)
    • make terminals occur alone on right-hand side
    • Reduce arity of every production to <= 2
    • remove epsilons
    • remove unit productions X::=Y
    • unproductive symbols
    • unreachable symbols
    • What if we swap the steps 4 and 5 ?
      • Potentially exponential blow-up in the # of productions

11 of 24

Ordering of �Unreachable / Unproductive symbols

S := B C | “”

C := D

D := a

R := r

First Unreachable then Unproductive

S := “”

C := D

D := a

S := B C | “”

C := D

D := a

S := B C | “”

C := D

D := C

R := r

First Unproductive then Unreachable

S := “”

S := “”

C := D

D := a

R := r

12 of 24

Exercises on LL(1) grammars

Exercise 1:

Consider a grammar for expressions where the multiplication sign is optional.

ex ::= ex + ex | ex * ex | ex ex |ID

  • Find an LL(1) grammar recognizing the same language
  • Create the LL(1) parsing table.

13 of 24

Exercise 1 – Solution

  •  

14 of 24

Exercise 1 – LL(1) parsing table

  •  

15 of 24

LL(1) parsing table

  •  

ID

+

*

EOF

ex

1

Error

Error

Error

Z

Error

2

Error

3

S

4

Error

Error

Error

Z2

6

7

5

7

16 of 24

Exercise 2

Prove that every LL(1) grammar is unambiguous.

Intuition:

There is a unique way to derive / parse a string because, for any given non-terminal, at most one alternative is applicable on scanning an input character.

This means we have unique left most derivations / parse trees for every string

Formal proof is presented in the next slide

17 of 24

Solution to Exercise 2 [Cont.]

  •  

18 of 24

Solution to Exercise 2 [Cont.]

  •  

19 of 24

Corollary of the proof

  •  

20 of 24

Exercise 3

  •  

21 of 24

Solution to Exercise 3

  •  

22 of 24

Exercise 4

Show that the regular languages can be recognized with LL(1) parsers. Describe a process that, given a regular expression, constructs an LL(1) parser for it.

23 of 24

Solution for Exercise 4

  •  

24 of 24

Exercise 5

  •