Id3 = 0
while (id3 < 10) {
println(“”,id3);� id3 = id3 + 1 }
source code
i�d3
=�
0�LF
w
id3�=�0�while�(�id3
<�10�)
lexer
characters
words
(tokens)
trees
parser
assign
while
i 0
+
*
3
7
i
assign
a[i]
<
i
10
Compiler
Parse Tree vs Abstract Syntax Tree (AST)
while (x > 0) x = x - 1
Pretty printer: takes abstract syntax tree (AST) and outputs the �leaves of one possible (concrete) parse tree.
parse(prettyPrint(ast)) ≈ ast
Parse Tree vs Abstract Syntax Tree (AST)
Abstract Syntax Trees for Statements
statmt ::= println ( stringConst , ident )
| ident = expr
| if ( expr ) statmt (else statmt)?
| while ( expr ) statmt� | { statmt* }
abstract class Statmt
case class PrintlnS(msg : String, var : Identifier) extends Statmt
case class Assignment(left : Identifier, right : Expr) extends Statmt
case class If(cond : Expr, trueBr : Statmt,
falseBr : Option[Statmt]) extends Statmt
case class While(cond : Expr, body : Expr) extends Statmt
case class Block(sts : List[Statmt]) extends Statmt
grammar:
AST classes:
Abstract Syntax Trees for Statements
statmt ::= println ( stringConst , ident )
| ident = expr
| if ( expr ) statmt (else statmt)?
| while ( expr ) statmt� | { statmt* }
abstract class Statmt
case class PrintlnS(msg : String, var : Identifier) extends Statmt
case class Assignment(left : Identifier, right : Expr) extends Statmt
case class If(cond : Expr, trueBr : Statmt,
falseBr : Option[Statmt]) extends Statmt
case class While(cond : Expr, body : Statmt) extends Statmt
case class Block(sts : List[Statmt]) extends Statmt
Our Parser Produced Nothing ☹
def skip(t : Token) : Unit = � if (lexer.token==t) lexer.next
else error(“Expected”+ t)
def expr : Unit = { … }
// statmt ::=
def statmt : Unit = {
// println ( stringConst , ident )
if (lexer.token == Println) { lexer.next;
skip(OParen)� skip(StringConst); skip(Comma)
skipIdentifier; skip(CParen)
// | ident = expr
} else if (lexer.token == Ident) { lexer.next;
skip(Equality); expr
New Parser: Returning an AST ☺
def getIdent: String = { lexer.token match {� case ID(str) => str� case _ => error(“Expected an identifier but found” + lexer.token)�}}
def expr : Expr = { … }
def statmt : Statmt = {
// println ( stringConst , ident )
if (lexer.token == Println) { lexer.next;
skip(openParen); val s = getString
skip(comma)
val id = getIdent; skip(closedParen)
PrintlnS(s, id)
} else if …
Constructing Tree for ‘if’
def statmt : Statmt = {
…
// if ( expr ) statmt (else statmt)?
// case class If(cond : Expr, trueBr: Statmt, falseBr: Option[Statmt])
} else if (lexer.token == ifKeyword) { lexer.next;
skip(openParen); val c = expr; skip(closedParen);
val trueBr = statmt
val elseBr = if (lexer.token == elseKeyword) {
lexer.next; Some(statmt) } else None
If(c, trueBr, elseBr) // we made a tree node ☺
}
Task: Constructing AST for ‘while’
def statmt : Statmt = {
…
// while ( expr ) statmt
// case class While(cond : Expr, body : Expr) extends Statmt
} else if (lexer.token == WhileKeyword) { �������
} else …
Here each alternative started with a different token
statmt ::=
println ( stringConst , ident )
| ident = expr
| if ( expr ) statmt (else statmt)?
| while ( expr ) statmt� | { statmt* }
What if this is not the case?
Left Factoring Example: Function Calls
statmt ::=
println ( stringConst , ident )
| ident = expr
| if ( expr ) statmt (else statmt)?
| while ( expr ) statmt� | { statmt* }
| ident (expr (, expr )* )
code to parse the grammar:
} else if (lexer.token.class == Ident) {
???� }
foo = 42 + x
foo ( u , v )
Left Factoring Example: Function Calls
statmt ::=
println ( stringConst , ident )
| ident assignmentOrCall
| if ( expr ) statmt (else statmt)?
| while ( expr ) statmt� | { statmt* }
assignmentOrCall ::= “=“ expr | (expr (, expr )* )
code to parse the grammar:
} else if (lexer.token.class == Ident) {
val id = getIdentifier(lexer.token); lexer.next
assignmentOrCall(id)� }
// Factoring pulls common parts from alternatives
Beyond Statements:�Parsing Expressions
While Language with Simple Expressions
expr ::= intLiteral | ident
| expr ( + | / ) expr
statmt ::=
println ( stringConst , ident )
| ident = expr
| if ( expr ) statmt (else statmt)?
| while ( expr ) statmt� | { statmt* }
Abstract Syntax Trees for Expressions
abstract class Expr
case class IntLiteral(x : Int) extends Expr
case class Variable(id : Identifier) extends Expr
case class Plus(e1 : Expr, e2 : Expr) extends Expr
case class Divide(e1 : Expr, e2 : Expr) extends Expr
expr ::= intLiteral | ident
| expr + expr | expr / expr
foo + 42 / bar + arg
Parser That Follows the Grammar?
def expr : Expr = {
if (??) IntLiteral(getInt(lexer.token))
else if (??) Variable(getIdent(lexer.token))
else if (??) {
val e1 = expr; val op = lexer.token; val e2 = expr
op match Plus {
case PlusToken => Plus(e1, e2)
case DividesToken => Divides(e1, e2)
} }
expr ::= intLiteral | ident
| expr + expr | expr / expr
When should parser enter the recursive case?!
input:
foo + 42 / bar + arg
Ambiguous Grammars
expr ::= intLiteral | ident
| expr + expr | expr / expr
ident + intLiteral / ident + ident
Ambiguous grammar: if some token sequence has multiple parse trees
(then it is has multiple abstract trees).
Each node in parse tree is given by
one grammar alternative.
Ambiguous Expression Grammar
expr ::= intLiteral | ident
| expr + expr | expr / expr
Show that the input above has two parse trees!
Each node in parse tree is given by
one grammar alternative.
ident + intLiteral / ident + ident
Exercise: Balanced Parentheses I
Show that the following balanced parentheses grammar is ambiguous (by finding two parse trees for some input sequence).
B ::= ε | ( B ) | B B �
Remark
B -> (B) -> (BB) -> ((B)B) -> ((B)) -> (())
B -> (B) -> (BB) -> ((B)B) -> (()B) -> (())
this correspond to different orders in which nodes in the tree are expanded.
Exercise: Balanced Parentheses
Show that the following balanced parentheses grammar is ambiguous (by finding two parse trees for some input sequence) and find unambiguous grammar for the same language.
B ::= ε | ( B ) | B B �
Solution for Unambiguous Parenthesis Grammar
B ::= ε | B (B)
B ::= ε | C B� C ::= (B)
B ::= ε | B (B)
This grammar is not ambiguous and is the solution. We did not prove unambiguity (we only tried to find ambiguous trees but did not find any).
Exercise: �Left Recursive and Right Recursive
We call a production rule “left recursive” if it is of the form
A ::= A p
for some sequence of symbols p. Similarly, a "right-recursive" rule is of a form
A ::= q A
Is every context free grammar that contains both left and right recursive rule for a some nonterminal A ambiguous?
Answer: it is ambiguous if such A is such that� S => … => uAv => … => w for w a string of terminals
An attempt to rewrite the grammar
def simpleExpr : Expr = { … }
def expr : Expr = {
var e = simpleExpr
while (lexer.token == PlusToken ||
lexer.token == DividesToken)) {
val op = lexer.token
val eNew = simpleExpr
op match {
case TokenPlus => { e = Plus(e, eNew) }
case TokenDiv => { e = Divide(e, eNew) }� }
}
e }
expr ::= simpleExpr (( + | / ) simpleExpr)*
simpleExpr ::= intLiteral | ident
foo + 42 / bar + arg
Not ambiguous, but gives wrong tree.
Making Grammar Unambiguous�and Constructing Correct Trees
Goal: Build Expression Trees
abstract class Expr
case class Variable(id : Identifier) extends Expr
case class Minus(e1 : Expr, e2 : Expr) extends Expr
case class Exp(e1 : Expr, e2 : Expr) extends Expr
different parse trees give different ASTs:
Minus(e1, Minus(e2,e3)) e1 - (e2 - e3)
Minus(Minus(e1,e2),e3) (e1 - e2) - e3
we want priorities and associativity such that
a – b^c^d – e becomes, just like (a – (b^(c^d))) – e
Minus(Minus(a, Exp(b, Exp(c, d))), e)
1) Layer the grammar by priorities
expr ::= term (- term)*
term ::= factor (^ factor)*
factor ::= id | (expr)
lower priority binds weaker,
so it goes outside
expr ::= ident | expr - expr | expr ^ expr | (expr)
2) Building trees: left-associative "-"
LEFT-associative operator
x – y – z ➔ (x – y) – z
Minus(Minus(Var(“x”),Var(“y”)), Var(“z”))
def expr : Expr = {
var e =
while (lexer.token == MinusToken) {
lexer.next
}
e
}
3) Building trees: right-associative "^"
RIGHT-associative operator – using recursion � (or also loop and then reverse a list)
x ^ y ^ z ➔ x ^ (y ^ z)
Exp(x, Exp(y,z) )
def expr : Expr = {
val e = factor
if (lexer.token == ExpToken) {
lexer.next
Exp(e, expr)
} else e
}
Manual Construction of Parsers
Exercise: Unary Minus
1) Show that the grammar
A ::= − A � A ::= A − id � A ::= id
is ambiguous by finding a string that has two different parse trees. Show those parse trees.
2) Make two different unambiguous grammars for the same language:� a) One where prefix minus binds stronger than infix minus.� b) One where infix minus binds stronger than prefix minus.�3) Show the syntax trees using the new grammars for the string you used to prove the original grammar ambiguous.
4) Give a regular expression describing the same language.
Unary Minus Solution Sketch
1) An example of a string with two parse trees is� - id - id�The two parse trees are generated by these imaginary parentheses (shown red): -(id-id) (-id)-id
and can generated by these derivations that give different parse trees� A => -A => - A - id => - id - id� A => A - id => - A - id => - id - id
2) a) prefix minus binds stronger:� A ::= B | A - id B ::= -B | id� b) infix minus binds stronger
A ::= C | -A C ::= id | C - id�3) in two trees that used to be ambiguous instead of some A’s we have B’s in a) grammar or C’s in b) grammar.
4) -*id(-id)*