1 of 42

Compiler Design

UNIT -III

1

Compiler Design by Dr. S.Siva Nageswara Rao

2 of 42

2

Compiler Design by Dr. S.Siva Nageswara Rao

3 of 42

Introduction

Why LR parsing:

  • LR parsers can be constructed to recognize virtually all programming-language constructs for which context-free grammars can be written.
  • The LR parsing method is the most general non-backtracking shift-reduce parsing method known, yet it can be implemented as efficiently as other shift-reduce methods.
  • The class of grammars that can be parsed using LR methods is a proper subset of the class of grammars that can be parsed with predictive parsers.
  • An LR parser can detect a syntactic error as soon as it is possible to do so on a left-to-right scan of the input.
  • The disadvantage is that it takes too much work to construct an LR parser by hand for a typical programming-language grammar. But there are lots of LR parser generators available to make this task easy.

3

Compiler Design by Dr. S.Siva Nageswara Rao

4 of 42

Bottom-Up Parsing:

  • Constructing a parse tree for an input string beginning at the leaves and going towards the root is called bottom-up parsing. A general type of bottom-up parser is a shift-reduce parser.

Shift-Reduce Parsing:

Shift-reduce parsing is a type of bottom -up parsing that attempts to construct a parse tree for an input string beginning at the leaves (the bottom) and working up towards the root (the top).

Example: Consider the grammar:

S → aABe

A → Abc | b

B → d

  • The string to be recognized is abbcde. We want to reduce the string to S.

4

Compiler Design by Dr. S.Siva Nageswara Rao

5 of 42

Steps of reduction:

  • abbcde (b,d can be reduced)
  • aAbcde (leftmost b is reduced)
  • aAde (now Abc,b,d qualified for reduction)
  • aABe (d can be reduced)
  • S
  • Each replacement of the right side of a production by the left side in the above example is called reduction, which is equivalent to rightmost derivation in reverse.

Handle:

A substring which is the right side of a production such that replacement of that substring by the production left side leads eventually to a reduction to the start symbol, by the reverse of a rightmost derivation is called a handle.

5

Compiler Design by Dr. S.Siva Nageswara Rao

6 of 42

Stack Implementation of Shift-Reduce Parsing:

  • There are two problems that must be solved if we are to parse by handle pruning.
  • The first is to locate the substring to be reduced in a right-sentential form, and the second is to determine what production to choose in case there is more than one production with that substring on the right side.
  • A convenient way to implement a shift-reduce parser is to use a stack to hold grammar symbols and an input buffer to hold the string w to be parsed.
  • We use $ to mark the bottom of the stack and also the right end of the input. Initially, the stack is empty, and the string w is on the input, as follows: STACK INPUT

$ w$

  • The parser operates by shifting zero or more input symbols onto the stack until a handle is on top of the stack. The parser repeats this cycle until it has detected an error or until the stack contains the start symbol and the input is empty: STACK INPUT

$ S $

6

Compiler Design by Dr. S.Siva Nageswara Rao

7 of 42

Example: The actions a shift-reduce parser in parsing the input string id1+id2*id3, according to the ambiguous grammar for arithmetic expression

Configuration of Shift Reduce Parser on input id1+id2*id3

7

Compiler Design by Dr. S.Siva Nageswara Rao

8 of 42

While the primary operations of the parser are shift and reduce, there are actually four possible actions a shift-reduce parser can make:

(1) shift, (2) reduce,(3) accept, and (4) error.

• In a shift action, the next input symbol is shifted unto the top of the stack.

• In a reduce action, the parser knows the right end of the handle is at the top of the stack. It must then locate the left end of the handle within the stack and decide with what non-terminal to replace the handle.

• In an accept action, the parser announces successful completion of parsing.

• In an error action, the parser discovers that a syntax error has occurred and calls an error recovery routine

8

Compiler Design by Dr. S.Siva Nageswara Rao

9 of 42

A stack implementation of a Shift-Reduce parser

9

Compiler Design by Dr. S.Siva Nageswara Rao

10 of 42

Conflicts During Shift-Reduce Parsing

  • There are context-free grammars for which shift-reduce parsers cannot be used.
  • Stack contents and the next input symbol may not decide action:
    • shift/reduce conflict: Whether make a shift operation or a reduction.
    • reduce/reduce conflict: The parser cannot decide which of several reductions to make.
  • If a shift-reduce parser cannot be used for a grammar, that grammar is called as non-LR(k) grammar.

left to right right-most k lookhead

scanning derivation

  • An ambiguous grammar can never be a LR grammar.

10

Compiler Design by Dr. S.Siva Nageswara Rao

11 of 42

Shift-Reduce Parsers

  • There are two main categories of shift-reduce parsers
  • Operator-Precedence Parser
    • simple, but only a small class of grammars.

  1. LR-Parsers
    • covers wide range of grammars.
      • SLR – simple LR parser
      • Canonical LR – most general LR parser
      • LALR – intermediate LR parser (lookhead LR parser)
    • SLR, CLR and LALR work same, only their parsing tables are different.

SLR

CFG

CLR

LALR

11

Compiler Design by Dr. S.Siva Nageswara Rao

12 of 42

Operator precedence parser

Operator precedence grammar have the property that

1. no production right side is ε or

2. has two adjacent nonterminals.

E 🡪 EAE | (E) | -E | id

A 🡪 + | - | * | / | ↑

The above grammar is not operator grammar, because the right side EAE has two nonterminals.

If we substitute for a each of its alternatives, we obtain the following operator grammar:

E🡪E+E | E-E | E*E | E/E | E↑E | (E) | -E | id

12

Compiler Design by Dr. S.Siva Nageswara Rao

13 of 42

Three disjoint precedence relations <, > and =

a<b means a “yields precedence to” b

a=b means a “has the same precedence as b”

a>b means a “takes precedence over” b

id

+

*

$

id

>

>

>

+

<

>

<

>

*

<

>

>

>

$

<

<

<

13

Compiler Design by Dr. S.Siva Nageswara Rao

14 of 42

$id + id * id$

$ < id > + < id > * < id > $

$ < E > + < id > * < id > $

$ < + < id > * < id > $

$ < + < E > * < id > $

$ < + < * < id > $

$ < + < * < E > $

$ < + < * > $

$ < + < E * E > $

$ < + < E > $

$ < + > $

$ < E + E > $

$ < E > $

14

Compiler Design by Dr. S.Siva Nageswara Rao

15 of 42

Precedence Function

id

+

*

$

id

>

>

>

+

<

>

<

>

*

<

>

>

>

$

<

<

<

+

*

id

$

f

2

4

4

0

g

1

3

5

0

g+

fid

f*

g*

f+

gid

f$

g$

15

Compiler Design by Dr. S.Siva Nageswara Rao

16 of 42

LR Parsers

  • The most powerful shift-reduce parsing (yet efficient) is:

LR(k) parsing.

left to right right-most k lookhead

scanning derivation (k is omitted 🡺 it is 1)

  • LR parsing is attractive because:
    • LR parsing is most general non-backtracking shift-reduce parsing, yet it is still efficient.
    • The class of grammars that can be parsed using LR methods is a proper superset of the class of grammars that can be parsed with predictive parsers. LL(1)-Grammars ⊂ LR(1)-Grammars
    • An LR-parser can detect a syntactic error as soon as it is possible to do so a left-to-right scan of the input.

16

Compiler Design by Dr. S.Siva Nageswara Rao

17 of 42

17

Compiler Design by Dr. S.Siva Nageswara Rao

18 of 42

LR Parsers

  • LR-Parsers
    • covers wide range of grammars.
    • SLR – simple LR parser
    • CLR – most general LR parser
    • LALR – intermediate LR parser (look-head LR parser)
    • SLR, CLR and LALR work same (they used the same algorithm), only their parsing tables are different.

18

Compiler Design by Dr. S.Siva Nageswara Rao

19 of 42

LR Parsing Algorithm

Sm

Xm

Sm-1

Xm-1

.

.

S1

X1

S0

a1

...

ai

...

an

$

Action Table

terminals and $

s

t four different

a actions

t

e

s

Goto Table

non-terminal

s

t each item is

a a state number

t

e

s

LR Parsing Algorithm

stack

input

output

19

Compiler Design by Dr. S.Siva Nageswara Rao

20 of 42

A Configuration of LR Parsing Algorithm

  • A configuration of a LR parsing is:

( So X1 S1 ... Xm Sm, ai ai+1 ... an $ )

Stack Rest of Input

  • Sm and ai decides the parser action by consulting the parsing action table. (Initial Stack contains just So )
  • A configuration of a LR parsing represents the right sentential form:

X1 ... Xm ai ai+1 ... an $

20

Compiler Design by Dr. S.Siva Nageswara Rao

21 of 42

Actions of A LR-Parser

  1. shift s -- shifts the next input symbol and the state s onto the stack

( So X1 S1 ... Xm Sm, ai ai+1 ... an $ ) 🡺 ( So X1 S1 ... Xm Sm ai s, ai+1 ... an $ )

  1. reduce A→β (or rn where n is a production number)
    • pop 2|β| (=r) items from the stack;
    • then push A and s where s=goto[sm-r,A]

( So X1 S1 ... Xm Sm, ai ai+1 ... an $ ) 🡺 ( So X1 S1 ... Xm-r Sm-r A s, ai ... an $ )

    • Output is the reducing production reduce A→β
  • Accept – Parsing successfully completed
  • Error -- Parser detected an error (an empty entry in the action table)

21

Compiler Design by Dr. S.Siva Nageswara Rao

22 of 42

Reduce Action

  • pop 2|β| (=r) items from the stack; let us assume that β = Y1Y2...Yr
  • then push A and s where s=goto[sm-r,A]

( So X1 S1 ... Xm-r Sm-r Y1 Sm-r ...Yr Sm, ai ai+1 ... an $ )

🡺 ( So X1 S1 ... Xm-r Sm-r A s, ai ... an $ )

  • In fact, Y1Y2...Yr is a handle.

X1 ... Xm-r A ai ... an $ ⇒ X1 ... Xm Y1...Yr ai ai+1 ... an $

22

Compiler Design by Dr. S.Siva Nageswara Rao

23 of 42

(SLR) Parsing Tables for Expression Grammar

state

id

+

*

(

)

$

E

T

F

0

s5

s4

1

2

3

1

s6

acc

2

r2

s7

r2

r2

3

r4

r4

r4

r4

4

s5

s4

8

2

3

5

r6

r6

r6

r6

6

s5

s4

9

3

7

s5

s4

10

8

s6

s11

9

r1

s7

r1

r1

10

r3

r3

r3

r3

11

r5

r5

r5

r5

Action Table

Goto Table

1) E → E+T

2) E → T

3) T → T*F

4) T → F

5) F → (E)

6) F → id

23

Compiler Design by Dr. S.Siva Nageswara Rao

24 of 42

Actions of A (S)LR-Parser -- Example

stack input action output

0 id*id+id$ shift 5

0id5 *id+id$ reduce by F→id F→id

0F3 *id+id$ reduce by T→F T→F

0T2 *id+id$ shift 7

0T2*7 id+id$ shift 5

0T2*7id5 +id$ reduce by F→id F→id

0T2*7F10 +id$ reduce by T→T*F T→T*F

0T2 +id$ reduce by E→T E→T

0E1 +id$ shift 6

0E1+6 id$ shift 5

0E1+6id5 $ reduce by F→id F→id

0E1+6F3 $ reduce by T→F T→F

0E1+6T9 $ reduce by E→E+T E→E+T

0E1 $ accept

24

Compiler Design by Dr. S.Siva Nageswara Rao

25 of 42

Constructing SLR Parsing Tables – LR(0) Item

  • An LR(0) item of a grammar G is a production of G a dot at the some position of the right side.
  • Ex: A → aBb Possible LR(0) Items: A → .aBb

(four different possibility) A → a.Bb

A → aB.b

A → aBb.

  • Sets of LR(0) items will be the states of action and goto table of the SLR parser.
  • A collection of sets of LR(0) items (the canonical LR(0) collection) is the basis for constructing SLR parsers.
  • Augmented Grammar:

G’ is G with a new production rule S’→S where S’ is the new starting symbol.

25

Compiler Design by Dr. S.Siva Nageswara Rao

26 of 42

Construct LR parsing table for the given context-free grammar  S–>AA   ,  A–>aA|b�

  • Solution: STEP1: Find augmented grammar�The augmented grammar of the given grammar is:-
  •  S’–>.S    [0th production]    � S–>.AA  [1st production]     � A–>.aA [2nd production]      � A–>.b  [3rd production]
  • STEP2: Find LR(0) collection of items�Below is the figure showing the LR(0) collection of items. We will understand everything one by one
  • The terminals of this grammar are {a,b}.�The non-terminals of this grammar are {S,A}
  • RULE – �If any non-terminal has ‘ . ‘ preceding it, we have to write all its production and add ‘ . ‘ preceding each of its production.

26

Compiler Design by Dr. S.Siva Nageswara Rao

27 of 42

27

Compiler Design by Dr. S.Siva Nageswara Rao

28 of 42

STEP3: Find FOLLOW of LHS of production

FOLLOW(S)=$�FOLLOW(A)=a,b,$

STEP 4: Defining 2 functions:goto[list of non-terminals] and action[list of terminals] in the parsing table. Below is the SLR parsing table.

28

Compiler Design by Dr. S.Siva Nageswara Rao

29 of 42

  • $ is by default a nonterminal that takes accepting state.
  • 0,1,2,3,4,5,6 denotes I0,I1,I2,I3,I4,I5,I6
  • I0 gives A in I2, so 2 is added to the A column and 0 rows.
  • I0 gives S in I1,so 1 is added to the S column and 1 row.
  • similarly  5 is written in  A column and 2 row, 6 is written in A column and 3 row.
  • I0 gives a in I3 .so S3(shift 3) is added to a column and 0 row.
  • I0 gives b in I4 .so S4(shift 4) is added to the b column and 0 row.
  • Similarly, S3(shift 3) is added on a column and 2,3 row ,S4(shift 4) is added on b column and 2,3 rows.
  • I4 is reduced state as ‘ . ‘ is at the end. I4 is the 3rd production of grammar(A–>.b).LHS of this production is A. FOLLOW(A)=a,b,$  . write r3(reduced 3) in the columns of a,b,$ and 4th row
  • I5 is reduced state as ‘ . ‘ is at the end. I5 is the 1st production of grammar(S–>.AA). LHS of this production is S.�FOLLOW(S)=$  . write r1(reduced 1) in the column of $ and 5th row
  • I6 is a reduced state as ‘ . ‘ is at the end. I6 is the 2nd production of grammar( A–>.aA). The LHS of this production is A.�FOLLOW(A)=a,b,$  . write r2(reduced 2) in the columns of a,b,$ and 6th row

29

Compiler Design by Dr. S.Siva Nageswara Rao

30 of 42

LALR Parser :

  • LALR Parser is lookahead LR parser. It is  the most powerful parser which can handle large classes of grammar.
  • The size of CLR parsing table is quite large as compared to other parsing table. LALR reduces the size of this table.LALR works similar to CLR.
  • The only difference is , it combines the similar states of CLR parsing table  into one single state. �The general syntax becomes  [A->∝.B, a ]�where A->∝.B is production and a is a terminal or right end marker $�LR(1) items=LR(0) items + look ahead

30

Compiler Design by Dr. S.Siva Nageswara Rao

31 of 42

Construct CLR parsing table for the given context free grammar�S-->AA A-->aA|b

STEP1- Find augmented grammar�The augmented grammar of the given grammar is:-

S'-->.S ,$ [0th production]

S-->.AA ,$ [1st production]

A-->.aA ,a|b [2nd production]

A-->.b ,a|b [3rd production]

  • The initial look ahead is always $
  • Now,the 1st production came into existence because of ‘ . ‘ before ‘S’ in 0th production.There is nothing after ‘S’, so the lookahead of 0th production will be the lookahead of 1st production. i.e. :  S–>.AA ,$
  • Now,the 2nd production came into existence because of ‘ . ‘ before ‘A’ in the 1st production.�After ‘A’, there’s  ‘A’. So, FIRST(A) is a,b. Therefore, the lookahead of the 2nd production becomes a|b.
  • Now,the 3rd production is a part of the 2nd production.So, the look ahead will be the same

31

Compiler Design by Dr. S.Siva Nageswara Rao

32 of 42

STEP2 – Find LR(0) collection of itemsBelow is the figure showing the LR(0) collection of items. We will understand everything one by one

The terminals of this grammar are {a,b}The non-terminals of this grammar are {S,A}

32

Compiler Design by Dr. S.Siva Nageswara Rao

33 of 42

33

Compiler Design by Dr. S.Siva Nageswara Rao

34 of 42

Once we make a CLR parsing table, we can easily make a LALR parsing table from it.

In the step2 diagram, we can see that 

  • I3 and I6 are similar except their lookaheads.
  • I4 and I7 are similar except their lookaheads.
  • I8 and I9 are similar except their lookaheads.

In LALR parsing table construction , we merge these similar states. 

  • Wherever there is 3 or 6, make it  36(combined form)
  • Wherever there is 4 or 7, make it  47(combined form)
  • Wherever there is 8 or 9, make it  89(combined form)

34

Compiler Design by Dr. S.Siva Nageswara Rao

35 of 42

Now we have to remove the unwanted rows

  • As we can see, 36 row has same data twice, so we delete 1 row.
  • We combine two 47 row into one by combining each value in the single 47 row.
  • We combine two 89 row into one by combining each value in the single 89 row.

The final LALR table looks like the below.

Now we have to remove the unwanted rows

  • As we can see, 36 row has same data twice, so we delete 1 row.
  • We combine two 47 row into one by combining each value in the single 47 row.
  • We combine two 89 row into one by combining each value in the single 89 row.

The final LALR table looks like the below

35

Compiler Design by Dr. S.Siva Nageswara Rao

36 of 42

Dangling-else Ambiguity�

  • The dangling else problem in syntactic ambiguity. It occurs when we use nested if.
  • When there are multiple “if” statements, the “else” part doesn’t get a clear view with which “if ” it should combine. 

For example:

if (condition) {

} if (condition 1) {

} if (condition 2) {

} else

{

}

  • In the above example, there are multiple “ifs” with multiple conditions and here we want to pair the outermost if with the else part.
  • But the else part doesn’t get a clear view with which “if” condition it should pair. This leads to inappropriate results in programming. 

36

Compiler Design by Dr. S.Siva Nageswara Rao

37 of 42

The Problem of Dangling-else:

Dangling else can lead to serious problems. It can lead to wrong interpretations by the compiler and ultimately lead to wrong results. 

For example:

  • Initialize k=0 and o=0
  • if(ch>=3)
  • if(ch<=10)
  • k++;
  • else o++;
  • In this case, we don’t know when the variable “o” will get incremented. Either the first “if” condition might not get satisfied or the second “if” condition might not get satisfied.
  • Even the first “if” condition gets satisfied, the second “if” condition might fail which can lead to the execution of the “else” part. Thus it leads to wrong results. 
  • To solve the issue the programming languages like C, C++, Java combine the “else” part with the innermost “if” statement. But sometimes we want the outermost “if” statement to get combined with the “else” part. 

37

Compiler Design by Dr. S.Siva Nageswara Rao

38 of 42

Resolving Dangling-else Problem

The first way is to design non-ambiguous programming languages. 

Secondly, we can resolve the dangling-else problems in programming languages by using braces and indentation. 

For example:

if (condition) {

if (condition 1) {

if (condition 2) { }

}

} else { }

In the above example, we are using braces and indentation so as to avoid confusion. 

Third, we can also use the “if – else if – else” format so as to specifically indicate which “else” belongs to which “if”

For example:

if(condition) {

} else if(condition-1) {

} else if(condition-2){

}

else{

}

38

Compiler Design by Dr. S.Siva Nageswara Rao

39 of 42

Error Recovery in LR Parsing:

  • When there is no valid continuation for the input scanned thus far, LR parsers report an error.
  • Before notifying a mistake, a CLR parser never performs a single reduction and an SLR or LALR may do multiple reductions, but they will never move an incorrect input symbol into the stack.
  • When the parser checks the table and discovers that the relevant action item is empty, an error is recognized in LR parsing. Goto entries can never be used to detect errors.

LR Parser Basically Uses the Mentioned Two Techniques to Detect Errors:

  • Syntactic Phase recovery
  • Panic mode recovery

39

Compiler Design by Dr. S.Siva Nageswara Rao

40 of 42

Syntactic Phase Recovery:

  • Syntactic Phase Recovery Follows the Given Steps:
  • Programmer mistakes that call error procedures in the parser table are determined based on the language.
  • Creating error procedures that can alter the top of the stack and/or certain symbols on input in a way that is acceptable for table error entries.

There are some of the errors that are detected during the syntactic phase recovery:

  • Errors in structure
  • Missing operator
  • Misspelled keywords
  • Unbalanced parenthesis

40

Compiler Design by Dr. S.Siva Nageswara Rao

41 of 42

Panic Mode Recovery:

  • This approach involves removing consecutive characters from the input one by one until a set of synchronized tokens is obtained.
  • Delimiters such as or are synchronizing tokens. The benefit is that it is simple to implement and ensures that you do not end up in an infinite loop.
  • The drawback is that a significant quantity of data is skipped without being checked for additional problems.
  • Panic mode recovery follows the given steps:
  • Scan the stack until you find a state ‘a’ with a goto() on a certain non-terminal ‘B’ (by removing states from the stack).
  • Until a symbol ‘b’ that can follow ‘B’ is identified, zero or more input symbols are rejected.

41

Compiler Design by Dr. S.Siva Nageswara Rao

42 of 42

Difference between Top-Down Parsing and Bottom-Up Parsing�

Aspect

Top-Down Parsing

Bottom-Up Parsing

Approach

Starts from the top (start symbol) of the grammar and tries to match it to the input string.

Starts from the input tokens and works upwards to construct the parse tree.

Process

Recursive descent parsing is a common approach, where each nonterminal in the grammar has a corresponding parsing function.

Shift-reduce parsing is a common approach, where tokens are shifted onto a stack and then reduced according to grammar rules.

Grammar Restrictions

Often used for LL(k) grammars, where k is the lookahead size.

Used for a broader range of grammars, including LR(k) grammars, where k is the lookahead size.

Predictability

Sometimes limited due to the need for lookahead and potential left recursion.

Generally more predictable and can handle a wider range of grammars.

Error Handling

Typically offers better error messages, as parsing functions are closely tied to grammar rules.

Error messages might be less descriptive, as errors are detected during the reduction phase.

Constructed Tree

Constructs a parse tree from top to bottom, which can closely resemble the input structure.

Constructs a parse tree from bottom to top, which might differ from the input's visual structure.

Examples

Recursive descent parsing, LL(k) parsers.

LR(0), SLR(1), LALR(1), LR(1) parsers.

Complexity

Can be simpler to implement for certain grammars but might be less efficient.

Often more complex to implement but tends to be more efficient for larger grammars.

42

Compiler Design by Dr. S.Siva Nageswara Rao