In this assignment, you will use bison to build a parser for the Fun language. Chapter 3 of your textbook (particularly 3.4) explains many of the concepts you'll need to know to complete this assignment. You will also need to refer to the Fun language reference.
Bison, a GNU variant of yacc, is a LALR parser generator, which means you don’t need to implement the parsing algorithm yourself; bison will do that for you. Bison reads context-free grammar in Extended Backus-Naur Form (BNF). Bison takes an input file with the extension ‘y’ as input. The structure of a bison input file is as follows:
%{ headers %} Declarations %% Grammar Rules %% User Codes |
Given FILENAME.y, bison will generate FILENAME.tab.c and FILENAME.tab.h files when you run this command:
When using flex with bison, you need to include the FILENAME.tab.h file generated by bison in FILENAME.l file.
The below is a simple example of flex and bison files together.
<calc.l> /* Definitions */ %{ #include "calc.tab.h" %} NUMBER [0-9]+ %% /* Rules */ {NUMBER} { yylval = atoi(yytext); return NUMBER; } "+" { return PLUS; } "*" { return MULT; } %% /* User Code */ | <calc.y> /* Definitions */ %{ #include <stdio.h> %} %token NUMBER PLUS MULT %% /* Rules */ exp: NUMBER { $$ = $1; } | exp PLUS exp { $$ = $1 + $3; } | exp MULT exp { $$ = $1 * $3; } ; %% /* User Code */ void yyerror(char *s) { printf("%s\n", s); } |
When we apply the rule exp: exp PLUS exp { $$ = $1 + $3; } we replace the right-hand side of the production in the parse stack with the left-hand side of the same production. In this case we pop “exp PLUS exp” and push “exp”. We have reduced the stack by popping three terms off the stack and pushing back one term. We may reference positions in the value stack in our C code by specifying “$1” for the first term on the right-hand side of the production, “$2” for the second, and so on. “$$” designates the top of the stack after reduction has taken place. The above action adds the value associated with two expressions, pops three terms off the value stack, and pushes back a single sum.
For rules that references the terminal symbol like NUMBER,
exp: NUMBER { $$ = $1; }
The value returned for $1 is yylval value we assigned in .l file. So in this case the type of $1, $3, and ‘exp’ should be int.
You can specify associativities of operators in bison. The relative precedence of different operators is controlled by the order in which they are declared. The first precedence/associativity declaration in the file declares the operators whose precedence is lowest, the next such declaration declares the operators whose precedence is a little higher, and so on.
%left PLUS %right NOT %nonassoc ASSIGN |
In assignment 1, we provided tokens.h file with the declaration of YYSTYPE. This is in fact supposed to be generated from bison input file using %union option. All nonterminal symbols in the left-hand side of production rules in .y file should be assigned a type which is declared in this union. In the example above, we are using ‘exp’ as a nonterminal symbol and its type is int. So if you include this in a FILENAME.y file,
%union { int expr; char *id; } %token <id> ID %type <expr> exp |
Resulting FILENAME.tab.h file will contain something like this:
typedef union YYSTYPE { int expr; char *str; } YYSTYPE; |
There are cases the type of a nonterminal symbol is different from int or char*.
Here we give a simple example. For example, suppose we represent all expressions using a class named ‘Exp’, which is the base class of all expressions. There are three child classes inheriting from this Exp: IntExp for numbers, PlusExp for ‘plus’, MultExp for ‘mul’ expression.
The prototype of constructors of the classes are:
IntExp::IntExp(int num);
PlusExp::PlusExp(Exp *lhs, Exp *rhs);
MultExp::MultExp(Exp *lhs, Exp *rhs);
Then calc.y file can be as follows:
... /* Rules */ exp: NUMBER { $$ = new IntExp($1); } | exp PLUS exp { $$ = new AddExp($1, $3); } | exp MULT exp { $$ = new MultExp($1, $3); } ; ... |
Now the type of ‘exp’ is not int but ‘Exp*’. So the type union declaration becomes:
%union { Exp *expr; ... } %type <expr> exp ... |
These are some helpful documents for flex and bison. You can find more on internet.
and add these lines in that place.
‘using namespace fun;’ will save your typing effort a little when you are implementing fun.y. This line should be added before ‘#include “fun.tab.hpp”.
Note that in our Makefile, bison will generate not fun.tab.c/fun.tab.h but fun.tab.cpp/fun.tab.hpp. This allows us to use C++ features in fun.y and makes linking with parser.cpp a bit easier. (i.e., you don’t need ‘extern “C”’)
The nonterminals you use can have any names that you want. For example,
But I recommend more descriptive names than "foo".
This will generate some syntax errors if your grammar is not correct. If it is correct, it will print the message “Program AST has not been generated”, which means the parsing succeeded but you haven’t constructed AST for the program yet.
This will print the code of the parsed program and runs the interpreter on the program. Note that the interpreter can cause a segmentation fault in case of an infinite recursions, and this is not a bug of the interpreter.
Upload your submissions to dropbox here.
Available from: Friday, 13 February 2015
Due date: Tuesday, 24 February 2015, midnight