Published using Google Docs
COS320: Assignment 2
Updated automatically every 5 minutes

COS320, Spring 2015: Compiling Techniques

You are here

 

Parser

In this assignment, you will use bison to build a parser for the Fun language. Chapter 3 of your textbook (particularly 3.4) explains many of the concepts you'll need to know to complete this assignment. You will also need to refer to the Fun language reference.

Bison, a GNU variant of yacc, is a LALR parser generator, which means you don’t need to implement the parsing algorithm yourself; bison will do that for you. Bison reads context-free grammar in Extended Backus-Naur Form (BNF). Bison takes an input file with the extension ‘y’ as input. The structure of a bison input file is as follows:

%{

headers

%}

Declarations

%%

Grammar Rules

%%

User Codes

Given FILENAME.y, bison will generate FILENAME.tab.c and FILENAME.tab.h files when you run this command:

When using flex with bison, you need to include the FILENAME.tab.h file generated by bison in FILENAME.l file.

The below is a simple example of flex and bison files together.

<calc.l>

/* Definitions */

%{

#include "calc.tab.h"

%}

NUMBER [0-9]+

%%

/* Rules */

{NUMBER} { yylval = atoi(yytext); return NUMBER; }

"+"      { return PLUS; }

"*"      { return MULT; }

%%

/* User Code */

<calc.y>

/* Definitions */

%{

#include <stdio.h>

%}

%token NUMBER PLUS MULT

%%

/* Rules */

exp: NUMBER { $$ = $1; }

   | exp PLUS exp { $$ = $1 + $3; }

   | exp MULT exp { $$ = $1 * $3; }

   ;

%%

/* User Code */

void yyerror(char *s) {

  printf("%s\n", s);  

}

When we apply the rule exp: exp PLUS exp { $$ = $1 + $3; } we replace the right-hand side of the production in the parse stack with the left-hand side of the same production. In this case we pop “exp PLUS exp” and push “exp”. We have reduced the stack by popping three terms off the stack and pushing back one term. We may reference positions in the value stack in our C code by specifying “$1” for the first term on the right-hand side of the production, “$2” for the second, and so on. “$$” designates the top of the stack after reduction has taken place. The above action adds the value associated with two expressions, pops three terms off the value stack, and pushes back a single sum.

For rules that references the terminal symbol like NUMBER,

exp: NUMBER { $$ = $1; }

The value returned for $1 is yylval value we assigned in .l file. So in this case the type of $1, $3, and ‘exp’ should be int.

You can specify associativities of operators in bison. The relative precedence of different operators is controlled by the order in which they are declared. The first precedence/associativity declaration in the file declares the operators whose precedence is lowest, the next such declaration declares the operators whose precedence is a little higher, and so on.

%left PLUS

%right NOT

%nonassoc ASSIGN

In assignment 1, we provided tokens.h file with the declaration of YYSTYPE. This is in fact supposed to be generated from bison input file using %union option. All nonterminal symbols in the left-hand side of production rules in .y file should be assigned a type which is declared in this union. In the example above, we are using ‘exp’ as a nonterminal symbol and its type is int. So if you include this in a FILENAME.y file,

%union {

  int expr;

  char *id;

}

%token <id> ID

%type <expr> exp

Resulting FILENAME.tab.h file will contain something like this:

typedef union YYSTYPE

{

  int expr;

  char *str;

} YYSTYPE;

There are cases the type of a nonterminal symbol is different from int or char*.

Here we give a simple example. For example, suppose we represent all expressions using a class named ‘Exp’, which is the base class of all expressions. There are three child classes inheriting from this Exp: IntExp for numbers, PlusExp for ‘plus’, MultExp for ‘mul’ expression.

The prototype of constructors of the classes are:

IntExp::IntExp(int num);

PlusExp::PlusExp(Exp *lhs, Exp *rhs);

MultExp::MultExp(Exp *lhs, Exp *rhs);

Then calc.y file can be as follows:

...

/* Rules */

exp: NUMBER { $$ = new IntExp($1); }

   | exp PLUS exp { $$ = new AddExp($1, $3); }

   | exp MULT exp { $$ = new MultExp($1, $3); }

   ;

...

Now the type of ‘exp’ is not int but ‘Exp*’. So the type union declaration becomes:

%union {

  Exp *expr;

  ...

}

%type <expr> exp

...

These are some helpful documents for flex and bison. You can find more on internet.

  1. Download files for as2:

  1. For this assignment you will use the fun.l you made in assignment 1. If yours doesn't work well enough to be usable, ask the Teaching Assistant for a working one.

  1. Edit one line in tools/parser/fun.l file you implemented in the last assignment. We provided tokens.h file in as1 which fun.l included, but in this assignment, declarations in tokens.h file will be provided from fun.tab.hpp file we generate from fun.y using bison. Delete this line from fun.l,

and add these lines in that place.

‘using namespace fun;’ will save your typing effort a little when you are implementing fun.y. This line should be added before ‘#include “fun.tab.hpp”.

Note that in our Makefile, bison will generate not fun.tab.c/fun.tab.h but fun.tab.cpp/fun.tab.hpp. This allows us to use C++ features in fun.y and makes linking with parser.cpp a bit easier. (i.e., you don’t need ‘extern “C”’)

  1. Edit tools/parser/fun.y until it specifies a parser for the Fun language. Don't worry at first about filling in the semantic actions; just get the parsing and grammar disambiguation taken care of.

The nonterminals you use can have any names that you want. For example,

But I recommend more descriptive names than "foo".

This will generate some syntax errors if your grammar is not correct. If it is correct, it will print the message “Program AST has not been generated”, which means the parsing succeeded but you haven’t constructed AST for the program yet.

  1. Once you have your grammar disambiguated and debugged, fill in the parser's semantic actions. These should generate abstract syntax in the Fun intermediate language. We provide all AST classes files for you; their header files are in include/AST/ and source files are in lib/AST/, but knowing interfaces in header files should be sufficient for you. To use them, you need to include “AST/AST.h” file from your fun.y file. You need to specify rules to create instances of AST classes in the parser’s semantic actions. Your final resulting ‘Program*’ instance should be assigned to ‘ProgramAST *program;’ variable defined in the top of fun.y.

  1. The file tests/all.fun tests all syntactic constructs in a trivial way but you will certainly want to write other fun files to debug your compiler. If you want to submit additional test files, please do so. We might use them to test the other projects! In a /* comment */ at the beginning of each test file, explain what particularly is being tested, and what the expected printing result/return value should be. If your test expected to produce parsing errors, copy the expected error messages into the comment.

  1. tools/parser/parser.cpp uses CodePrinter class and Interpreter class. (Their class declarations are in include/Visitor/ and implementations are in lib/Visitor/) CodePrinter takes a Program* class pointer created by fun.y and pretty-prints the program trying to minimize the number of parentheses, while preserving the semantics of the program. Interpreter evaluates the program and prints the output value. You don’t need to understand how CodePrinter or Interpreter works in this assignment though.

  1. Once you are all done with generating AST, you can build your program.

This will print the code of the parsed program and runs the interpreter on the program. Note that the interpreter can cause a segmentation fault in case of an infinite recursions, and this is not a bug of the interpreter.

  1. Submit your README, fun.l, fun.y, and any *.fun test files you wish to. As in the previous assignment, in the README, please write,

Upload your submissions to dropbox here.

Available from: Friday, 13 February 2015

Due date:           Tuesday, 24 February 2015, midnight