1 of 65

Theory of Computation

Unit-3 Context Free Grammar

PPT

Prepared By

Archana Kadam

Assistant Professor,

Department of Computer Engineering, PCCOE

1

2 of 65

Unit-3 Context-Free Grammar

Introduction, Regular Grammar, Context Free Grammar- Definition, Derivation. Sentential form, parse tree, Ambiguous Grammar, Simplification of CFG: Eliminating unit productions, useless production, useless symbols, Greibach normal form, Chomsky normal form. Types of Grammar: Chomsky Hierarchy, Context Free Language (CFL): Closure properties of CFL.

Case Study- CFG for Parenthesis Match- XML and Document Type Definitions, Natural Language Processing- Text Parsing

3 of 65

Definition

  • A context-free grammar (CFG) G is a quadruple (V, T, P, S) where
  • V: a set of non-terminal symbols
  • T: a set of terminals (V ∩ T = Ǿ)
  • P: a set of rules (P: V → (V U T)*)
  • S: a start symbol.

4 of 65

Example

  • V = {q, f,}
  • T = {0, 1}
  • P = {q → 11q, q → 00f,

f → 11f, f → ε }

  • S = q
  • (R= {q → 11q | 00f, f → 11f | ε })

5 of 65

How do we use rules?

  • If A → B, then xAy xBy and we say that

xAy derivates xBy.

  • If s ··· t, then we write s * t.
  • A string x in Σ* is generated by G=(V,Σ,R,S)

if S * x.

  • L(G) = { x in Σ* | S * x}.

6 of 65

Example

  • G = ({S}, {0,1}. {S → 0S1 | ε }, S)
  • ε in L(G) because S -> ε .
  • 01 in L(G) because S -> 0S1-> 01.
  • 0011 in L(G) because

S -> 0S1 -> 00S11 -> 0011.

  • 0 1 in L(G) because S -> * 0 1 .
  • L(G) = {0 1 | n > 0}

7 of 65

Context-Free Language (CFL)

  • A language L is context-free if there exists a CFG G such that L = L(G).
  • Example are palindrome
  • L(G) is
  • S-> aSa | bSb | a | b

8 of 65

Corollary

  • Every regular language is a CFL.
  • The class of regular languages is a proper subclass of CFLs.

CFL

Regular

9 of 65

Context Free Grammar

  • Construct the CFG for L ={ an bn | n>=1}
  • L = {ab, aabb, aaabbb,…..}

P : S-> aSb | ab

Construct the CFG for L of palindrome strings. S - > aSa | bSb | Ɛ

9

10 of 65

Derivations

  • There are three ways to derive a string from grammar
  • Leftmost Derivation : always deriving leftmost non-terminal first
  • Rightmost Derivation : always deriving rightmost non-terminal first
  • Derivation Tree / Parse Tree

10

11 of 65

Derivations Examples

  • Consider following CFG , G= {(S,A), (a,b), P, S}
  • Where P: S-> aAS | a
          • A-> SbA | SS | ba
  • Derive string aabbaa using leftmost and rightmost derivation
  • First number the production rules
  • 1. S -> aAS
  • 2. S -> a
  • 3. A -> SbA
  • 4. A -> SS
  • 5. A -> ba

11

Left most derivation is

S-> aAS…………Rule 1

-> aSbAS……...Rule 3

->aabAS……….Rule 2

->a a b b a S….Rule 5

-> a a b b a a…...Rule 2

12 of 65

Derivations Examples

  • Consider following CFG , G= {(S,A), (a,b), P, S}
  • Where P: S-> aAS | a
          • A-> SbA | SS | ba
  • Derive string aabbaa using leftmost and rightmost derivation
  • First number the production rules
  • 1. S -> aAS
  • 2. S -> a
  • 3. A -> SbA
  • 4. A -> SS
  • 5. A -> ba

12

Right most derivation is

S-> aAS…………Rule 1

-> aAa….……...Rule 2

->aSbAa……….Rule 3

->a S b b a a….Rule 5

-> a a b b a a…...Rule 2

13 of 65

Derivations Examples

  • Derive string aabbaa using derivation tree
  • First number the production rules
  • 1. S -> aAS
  • 2. S -> a
  • 3. A -> SbA
  • 4. A -> SS
  • 5. A -> ba

13

Derivation Tree

S

a

A

S

S

b

A

b

b

a

a

14 of 65

Ambiguous Context Free Grammar

  • A CFG for a language is said to be ambiguous, if there exists at least one string which can be generated more than one ways.
  • There exists more than one leftmost derivation, rightmost derivation and more than one derivation tree.

14

15 of 65

Ambiguous Context Free Grammar

  • Example :
  • E -> E + E | E * E | id

Now to derive a string id + id * id there are two leftmost derivation given below

So this grammar is ambiguous

15

E => E + E

=> id + E

=> id + E * E

=> id + id * E

=> id + id * id

E => E * E

=> E + E * E

=> id + E * E

=> id + id * E

=> id + id * id

16 of 65

Ambiguous Context Free Grammar

  • Example :

Check the ambiguity in following grammar

S -> iCtS | iCtSeS | a

C -> b

Consider string ‘ibtibtaea’

16

17 of 65

Ambiguous Context Free Grammar

S -> iCtS | iCtSeS | a

C -> b

Consider string ‘ibtibtaea’

So this grammar is ambiguous

17

S -> iCtS

-> ibtS

-> ibtiCtSeS

-> ibtibtaea

S -> iCtSeS

-> ibtSeS

-> ibtiCtSeS

->ibtibtaea

18 of 65

Ambiguous Context Free Grammar

Check whether the grammar is ambiguous or not.

A->AA

A->(A)

A->a

For the string "a(a)(a)a" 

18

19 of 65

19

As two derivation tree Grammar is Ambiguous

20 of 65

Removal of Ambiguity

Example :

E -> E + E | E * E | id

Postpone higher priority operators from start symbol

Use other Non terminals to take care of it

E-> E + T | T

T - > T * F | F

F -> id

20

21 of 65

CFG Simplification

21

22 of 65

Three ways to simplify/clean a CFG

(clean)

  1. Eliminate useless symbols

(simplify)

  1. Eliminate ε-productions

  • Eliminate unit productions

22

A => ε

A => B

23 of 65

Eliminating useless symbols

A symbol X is reachable if there exists:

    • S 🡺* α X β

A symbol X is generating if there exists:

    • X 🡺* w,
      • for some w T*

For a symbol X to be “useful”, it has to be both reachable and generating

      • S 🡺* α X β 🡺* w’, for some w’ T*

23

reachable

generating

24 of 65

Algorithm to detect useless symbols

  1. First, eliminate all symbols that are not generating

  • Next, eliminate all symbols that are not reachable

24

25 of 65

Example: Useless symbols

  • S🡺AB | a
  • A🡺 b

  1. A, S are generating
  2. B is not generating (and therefore B is useless)
  3. ==> Eliminating B… (i.e., remove all productions that involve B)
    1. S🡺 a
    2. A 🡺 b
  4. Now, A is not reachable and therefore is useless

  • Simplified G: S 🡺 a

25

26 of 65

Eliminating ε-productions

Theorem: If G=(V,T,P,S) is a CFG for a language L, then L\ {ε} has a CFG without ε-productions

Definition: A is “nullable” if A🡺* ε

  • If A is nullable, then any production of the form �“B🡺 CAD” can be simulated by:
      • B 🡺 CD | CAD
        • This can allow us to remove ε transitions for A

26

A 🡺 ε

What’s the point of removing ε-productions?

27 of 65

Algorithm to detect all nullable variables

  • Basis:
    • If A🡺 ε is a production in G, then A is nullable�(note: A can still have other productions)
  • Induction:
    • If there is a production B🡺 C1C2…Ck, where every Ci is nullable, then B is also nullable

27

28 of 65

Eliminating ε-productions

Given: G=(V,T,P,S)

Algorithm:

    • Detect all nullable variables in G
    • Then construct G1=(V,T,P1,S) as follows:
      1. For each production of the form: A🡺X1X2Xk, where k≥1, suppose m out of the k Xi’s are nullable symbols
      2. Then G1 will have 2m versions for this production
        1. i.e, all combinations where each Xi is either present or absent
      3. Alternatively, if a production is of the form: A🡺ε, then remove it

28

29 of 65

Example: Eliminating ε-productions

  • Let L be the language represented by the following CFG G:
    1. S🡺AB
    2. A🡺aAA | ε
    3. B🡺bBB | ε

Goal: To construct G1, which is the grammar for L-{ε}

  • Nullable symbols: {A, B}

  • G1 can be constructed from G as follows:
    • B 🡺 b | bB | bB | bBB
      • ==> B 🡺 b | bB | bBB
    • Similarly, A 🡺 a | aA | aAA
    • Similarly, S 🡺 A | B | AB

  • Note: L(G) = L(G1) U {ε}

29

G1:

  • S 🡺 A | B | AB
  • A 🡺 a | aA | aAA
  • B 🡺 b | bB | bBB
  • S 🡺 ε

+

Simplified�grammar

30 of 65

Eliminating unit productions

30

A => B

B has to be a variable

What’s the point of removing unit transitions ?

A=>B | …

B=>C | …

C=>D | …

D=>xxx | yyy | zzz

A=>xxx | yyy | zzz | …

B=> xxx | yyy | zzz | …

C=> xxx | yyy | zzz | …

D=>xxx | yyy | zzz

Will save #substitutions

E.g.,

before

after

31 of 65

Putting all this together…

  • Theorem: If G is a CFG for a language that contains at least one string other than ε, then there is another CFG G1, such that L(G1)=L(G) - ε, and G1 has:
    • no ε -productions
    • no unit productions
    • no useless symbols

  • Algorithm:

Step 1) eliminate ε -productions

Step 2) eliminate unit productions

Step 3) eliminate useless symbols

31

Again,

the order is�important!

Why?

32 of 65

Normal Forms

32

33 of 65

Why normal forms?

  • If all productions of the grammar could be expressed in the same form(s), then:

    • It becomes easy to design algorithms that use the grammar

    • It becomes easy to show proofs and properties

33

34 of 65

Chomsky Normal Form (CNF)

Let G be a CFG for some L-{ε}

Definition:

G is said to be in Chomsky Normal Form if all its productions are in one of the following two forms:

      • A 🡺 BC where A,B,C are variables, or
      • A 🡺 a where a is a terminal
    • G has no useless symbols
    • G has no unit productions
    • G has no ε-productions

34

35 of 65

CNF checklist

35

G1:

      • E 🡺 E+T | T*F | (E) | Ia | Ib | I0 | I1
      • T 🡺 T*F | (E) | Ia | Ib | I0 | I1
      • F 🡺 (E) | Ia | Ib | I0 | I1
      • I 🡺 a | b | Ia | Ib | I0 | I1

Checklist:

  • G has no ε-productions
  • G has no unit productions
  • G has no useless symbols
  • But…
    • the normal form for productions is violated

Is this grammar in CNF?

So, the grammar is not in CNF

36 of 65

How to convert a G into CNF?

  • Assumption: G has no ε-productions, unit productions or useless symbols

  1. For every terminal a that appears in the body of a production:
    1. create a unique variable, say Xa, with a production Xa 🡺 a, and
    2. replace all other instances of a in G by Xa

  • Now, all productions will be in one of the following two forms:
    • A 🡺 B1B2… Bk (k≥3) or A🡺a

  1. Replace each production of the form A 🡺 B1B2B3… Bk by:

    • A🡺B1C1 C1🡺B2C2 … Ck-3🡺Bk-2Ck-2 Ck-2🡺Bk-1Bk

36

B1 C1

B2 C2

and so on…

37 of 65

Example #1

37

G:

S => AS | BABC

A => A1 | 0A1 | 01

B => 0B | 0

C => 1C | 1

X0 => 0

X1 => 1

S => AS | BY1

Y1 => AY2

Y2 => BC

A => AX1 | X0Y3 | X0X1

Y3 => AX1

B => X0B | 0

C => X1C | 1

G in CNF:

All productions are of the form: A=>BC or A=>a

38 of 65

Example #2

38

G:

  1. E 🡺 E+T | T*F | (E) | Ia | Ib | I0 | I1
  2. T 🡺 T*F | (E) | Ia | Ib | I0 | I1
  3. F 🡺 (E) | Ia | Ib | I0 | I1
  4. I 🡺 a | b | Ia | Ib | I0 | I1
  1. E 🡺 EX+T | TX*F | X(EX) | IXa | IXb | IX0 | IX1
  2. T 🡺 TX*F | X(EX) | IXa | IXb | IX0 | IX1
  3. F 🡺 X(EX) | IXa | IXb | IX0 | IX1
  4. I 🡺 Xa | Xb | IXa | IXb | IX0 | IX1
  5. X+ 🡺 +
  6. X* 🡺 *
  7. X+ 🡺 +
  8. X( 🡺 (
  9. …….

Step (1)

  1. E 🡺 EC1 | TC2 | X(C3 | IXa | IXb | IX0 | IX1
  2. C1 🡺 X+T
  3. C2 🡺 X*F
  4. C3 🡺 EX)
  5. T 🡺 ..…….
  6. ….

Step (2)

39 of 65

Other Normal Forms

  • Griebach Normal Form (GNF)
    • All productions of the form

A==>a α

Where a is terminal and α is string of zero or more non-terminals

39

40 of 65

Greibach Normal Form (GNF)�

    • Two ways to handle to convert grmmar in GNF
    • Theorem 1: By substitution Method
    • Example:

A->ABA | AB

A-> aA | a

Using theorem 1 of substitution for B in production of first grammsr

A-> aABA | aBA | aAB| aB

Now all grammar production are in required format.

(Kindly note do not replace for grammar production already in format)

40

41 of 65

Greibach Normal Form (GNF)�

    • Two ways to handle to convert grammar in GNF
    • Theorem 2:is used to handle left recursive grammar
    • If CFG consist of production of the form

A -> Aα1 | Aα2 |….| A αr | β1 | β2 |….| βn

Then equivalent grammar can be formed as

A -> β1 | β2 |….| βn

A -> β1 Z | β2 Z|….| βn Z

Z -> α1 | α2 |….| αr

Z -> α1 Z | α2 Z |….| αr Z

41

42 of 65

Greibach Normal Form (GNF)�

Example :

A -> AB| aB | bB| c

B -> b

To bring this grammar in GNF will apply theorem 2 to handle left recursive grammar of A -> AB

Here, α1 = B,

β1 = aB, β2 = bB, , β3 = c

Then,

A -> aB | bB| c

A -> | aBZ | bBZ| cZ

Z -> B | BZ

42

Now A’s production are in GNF.

Need to handle Z, will apply theorem one of substitution

So,

Z-> b | bZ

Final Answer is

A -> aB | bB| c

A -> | aBZ | bBZ| cZ

Z -> b | bZ

43 of 65

Chomsky Hierarchy

43

26-09-2023

44 of 65

Introduction

  • Different classes of phrase structure grammar can be obtained with few constraints
  • It is a containment hierarchy
  • Described by Chomsky-Schitzenberger
  • Suggested four different classes

Type-0 (Unrestricted grammar)

Type-1 (Context sensitive grammar)

Type-2 (Context free grammar)

Type-3 (Regular grammar)

44

26-09-2023

45 of 65

Type-0 (Unrestricted grammar)

  • No restriction on production rules
  • Production of form α −> β , where α ≠ ε
  • Semi-thue grammar or unrestricted grammar
  • Generate languages are recognized by Turing machines (recursively enumerable languages)
  • For example: A −>AB, AB −>BC, B −>AcD, Ac −>abc, D −> ε

45

26-09-2023

46 of 65

Type-1 (Context sensitive grammar)

Restriction on production rules are as follows:

1. for α −> β, the length of β is at least as much as length of α, except for S −> ε

2. The rule S −> ε is allowed only if S doesn’t appear on RHS

3. Context sensitive because of productions of form α1Aiα2 −> α1βα2 , β ≠ ε and α1 , α2 may or may not be empty

Context sensitive language (CSL) recognized by turing machine

Example: A −>aBC, aB −>ab, bC−>bc

Derivation:

A=>aBC=>abC ;

abC=>abc ; finally A=>abc

46

26-09-2023

47 of 65

Type-2 (Context free grammar)

  • Production of form A−> α , where α is sentential form
  • Context free languages (CFL) generated by context free grammar are recognized by Push Down Automata (PDA)
  • For example:
  • S −>aSb, S −>bSb, S −>a, S −>b

47

26-09-2023

48 of 65

Type-3 (Regular grammar)

  • The restriction on production rules are as follows:
  • Left-hand side (LHS) should contain only one non-terminal (NT) symbol
  • Right-hand side (RHS) can contain at most one NT symbol, which should be rightmost or leftmost symbol
  • Regular language by regular grammar , which are recognized by finite state machines (FSM), and denoted by regular expressions
  • Based on position of RHS non-terminal symbol, the regular grammar is classified as left-linear or right-linear grammar

48

26-09-2023

49 of 65

Type-3 (Regular grammar)

  1. Left-linear grammar:
  2. allowed productions are: A −>Bw , A −>w,
  3. The rule S −> ε is allowed only if S doesn’t appear on RHS
  4. For example:

S −>Ca|Bb ,

C −>Bb ,

B −>Ba|b

49

26-09-2023

50 of 65

Type-3 (Regular grammar)

2. Right-linear grammar:

  • allowed productions are:
  • A −>wB , A −>w,
  • The rule S −> ε is allowed only if S doesn’t appear on RHS
  • For example: S −>0A, A −>0A | 1

50

26-09-2023

51 of 65

CFL Closure Properties

51

52 of 65

Closure Property Results

  • CFLs are closed under:
    • Union
    • Concatenation
    • Kleene closure operator
    • Substitution
    • Homomorphism, inverse homomorphism
    • reversal
  • CFLs are not closed under:
    • Intersection
    • Difference
    • Complementation

52

53 of 65

Substitution of a CFL: example

    • Let L = language of binary palindromes s.t., substitutions for 0 and 1 are defined as follows:
      • s(0) = {anbn | n ≥1}, s(1) = {xx,yy}
    • Prove that s(L) is also a CFL.

53

CFG for L:

S=> 0S0|1S1|ε

CFG for s(0):

S0=> aS0b | ab

CFG for s(1):

S1=> xx | yy

Therefore, CFG for s(L):

S=> S0SS0 | S1 S S1

S0=> aS0b | ab

S1=> xx | yy

54 of 65

CFLs are closed under union

Let L1 and L2 be CFLs

To show: L2 U L2 is also a CFL

  • Make a new language:
    • Lnew = {a,b} s.t., s(a) = L1 and s(b) = L2

==> s(Lnew) == same as == L1 U L2

  • A more direct, alternative proof
    • Let S1 and S2 be the starting variables of the grammars for L1 and L2
      • Then, Snew => S1 | S2

54

Let us show by using the result of Substitution

55 of 65

CFLs are closed under concatenation

  • Let L1 and L2 be CFLs

  • Make Lnew= {ab} s.t., � s(a) = L1 and s(b)= L2

==> L1 L2 = s(Lnew)

  • L1 grammar is S-> aSa|bSb|a|b|Ɛ
  • L2 grammar is S-> aSb |Ɛ

S->S1.S2

S1-> aSa|bSb|a|b|Ɛ

S2-> aSb |Ɛ

55

Let us show by using the result of Substitution

56 of 65

CFLs are closed under Kleene Closure

  • Let L be a CFL
    • Let Lnew = {a}* and s(a) = L1
    • Then, L* = s(Lnew)

56

  • Let L ={a}, S->a
  • S->S1S | Ɛ
  • S1->a

57 of 65

CFLs are closed under Reversal

  • Let L be a CFL, with grammar G=(V,T,P,S)
  • For LR, construct GR=(V,T,PR,S) s.t.,
    • If A==> α is in P, then:
      • A==> αR is in PR
      • (that is, reverse every production)
  • S-> aSb
  • SR -> bSa

57

We won’t use substitution to prove this result

58 of 65

CFLs are not closed under Intersection

  • Existential proof:
    • L1 = {0n1n2i | n≥1,i≥1}
    • L2 = {0i1n2n | n≥1,i≥1}
  • Both L1 and L2 are CFLs
    • S-> AB
    • A->0A1 | 01
    • B->2B|2
  • Therefore, L1 ∩ L2 will be 0n1n2n
  • Not possible to construct grammar so CFL’s are not closed under intersection

58

Some negative closure results

59 of 65

CFLs are not closed under complementation

  • Follows from the fact that CFLs are not closed under intersection

  • L1 ∩ L2 = L1 U L2

59

Some negative closure results

Logic: if CFLs were to be closed under complementation � 🡺 the whole right hand side becomes a CFL (because � CFL is closed for union)

🡺 the left hand side (intersection) is also a CFL

🡺 but we just showed CFLs are � NOT closed under intersection!

🡺 CFLs cannot be closed under complementation.

60 of 65

CFLs are not closed under difference

  • Follows from the fact that CFLs are not closed under complementation

  • Because, if CFLs are closed under difference, then:
    • L = ∑* - L
    • So L has to be a CFL too
    • Contradiction

60

Some negative closure results

61 of 65

Decision Properties

  • Emptiness test
    • Generating test
    • Reachability test
  • Membership test
    • PDA acceptance

61

62 of 65

Application of CFL

  • Compiler –Parsing
    • Eg a=b;
    • <STMT> -> <Assign STMT> | <If STMT> |<while loop>|<for loop>
    • <while loop>-> while <condition><block of stmt>
    • <block of stmt> -> <List of STMT>|<STMT>
    • <Assign STMT> -> <ID>=<ID>

62

63 of 65

Application of CFL

  • Markup Languages
    • DOC -> Element DOC
    • Element -> Text | <EM> DOC </EM> | <P> DOC </P> | <OL> List </OL>

  • XML
  • DTD

63

64 of 65

64

memo :

addressee: sender date title body;

addressee : TEXT;

sender : TEXT;

date : DATE;

title : TEXT;

body : TEXT;

For example, the following would be a valid memo:

<memo> <addressee>John</addressee> <sender>Carla</addressee> <date>1998-09-01</date> <title>New coffee maker</title>

<body> The new coffee maker has been installed! Operation is simple: put a cup in the opening and press the red button. </body>

</memo>

65 of 65

“Undecidable” problems for CFL

  • Is a given CFG G ambiguous?
  • Is a given CFL inherently ambiguous?
  • Is the intersection of two CFLs empty?
  • Are two CFLs the same?
  • Is a given L(G) equal to ∑*?

65