COS320, Spring 2015: Compiling Techniques

Type Checker

Implement a type-checker for the Fun language.

Download the archived file for as3 and extract it to the project directory. We are going to use libraries from the last assignments, so don’t delete them.

cd $COS320_SRC_ROOT
wget http://www.cs.princeton.edu/courses/archive/spring15/cos320/homeworks/as3/as3.tar.gz
tar xfvz as3.tar.gz
rm as3.tar.gz

If your own fun.l and fun.y work well, use them. If you want to use your own version, copy those files into your new type checker tool directory:

cd $COS320_SRC_ROOT
cp tools/parser/fun.l tools/parser/fun.y tools/tchecker

You can also create symbolic links for these files instead of copying them.

cd $COS320_SRC_ROOT/tools/tchecker
ln -s ../parser/fun.l
ln -s ../parser.fun.y

If you want to use TA’s parser, we provide compiled files for the lexer and the parser for you (lex.yy.cpp, fun.tab.hpp, and fun.tab.cpp). In this case, additionally download and unzip this file:

cd $COS320_SRC_ROOT
wget http://www.cs.princeton.edu/courses/archive/spring15/cos320/homeworks/as3_ref_parser/as3_ref_parser.tar.gz
tar xfvz as3_correct_parser.tar.gz
rm as3_correct_parser.tar.gz

tools/tchecker/tchecker.cpp will call TypeChecker class, whose header file is include/Visitor/TypeChecker.h and source file is lib/Visitor/TypeChecker.cpp. Your main job is to edit those two files to implement a type checker for Fun language.

Edit include/Visitor/TypeChecker.h and lib/Visitor/TypeChecker.cpp to implement the functions visit, sub, and join. The typing rules for the Fun language are described in the Fun language definition. You need to implement these rules. The main catch is that these rules are declarative. In other words, they are nondeterministic and do not directly specify an algorithm for type checking. You need to implement a (deterministic) algorithm for type checking that is sound and complete with respect to the nondeterministic rules. Your algorithm will be sound if whenever it says a program type checks, there exists some typing derivation in the nondeterministic, declarative type system for the program. Your algorithm will be complete if whenever there exists some typing derivation, your algorithm will find the typing derivation and say the program in question type checks.

The straightforward way to implement various functionality (code pretty-printer, type checker, …) is to add those methods directly into AST classes; that would require adding prettyprint() or typecheck() method to every AST class. But the type-checker class has been designed in the visitor design pattern. The visitor design pattern is a way of separating an algorithm from an object structure on which it operates. You can specify all necessary algorithms in TypeChecker class alone without modifying AST classes.

To understand the visitor pattern, make sure you are familiar with the concept of virtual functions first. All Java functions are virtual by nature, but you need to use ‘virtual’ keyword before the declaration of a function to make it a virtual function. In short, virtual functions are functions whose behavior can be overridden within an inheriting class by a function with the same signature. The reason we need the visitor pattern is that unlike ML, C++ does not support double dispatch naturally. C++ supports single dispatch through virtual methods, which means, assuming the first argument of every member function is ‘this’ pointer, C++ runtime will call different virtual method based on the class instance. But C++ does not do the same thing for other ‘arguments’ of functions, unlike ML. Java does not support double dispatch naturally either. The visitor pattern is a way of simulating this double dispatch in languages like C++ or Java.

Ideally, every AST class needs to have one ‘void accept(Visitor &v)’ function that works for all visitor subclasses. But in our case the visit() and accept() for TypeChecker class needs a return type, so we need a separate accept() method in each AST class. We have already defined this method in every AST class, so you only need to edit TypeChecker class itself. First you might want to look at CodePrinter or Interpreter class in the same directory with TypeChecker. They were written in the visitor pattern too. Unlike CodePrinter, TypeChecker does not need to visit ***TypeAST classes; we have added some dummies for these cases for you. And unlike Interpreter which only evaluates functions that are called during the execution, TypeChecker should check all defined functions. Note that while the Interpreter does some limited form of error checking, just mimicking its behavior is not sufficient for the TypeChecker.

MyType class is declared in include/AST/MyType.h. Don’t confuse this with TypeAST class and its subclasses; those are AST nodes. We provide a host of helpful functions in MyType class and its subclasses. MyType class itself is a base class for all types but it also serves as a ‘factory of types’ providing many static helper methods. You are not allowed to instantiate subclasses of MyType class directly by calling ‘new’ (and you can’t because their constructors are private anyway). Use ‘MyType::get***Type()’ methods defined in MyType class instead. Also, given a subclass instance of TypeAST class, you can use ‘MyType::getType(TypeAST *tnode)’ function to get a matching type instance. This makes memory management easier; all those MyType instances returned by get***Type methods will be deleted by MyType class so you SHOULD NOT delete them manually. There are also many other helper methods in the MyType class and its subclasses. Have a look at include/AST/MyType.h and it should be straightforward.

From now on, if you ever need to use C++ RTTI (dynamic_cast<> or typeid<>), use dyn_cast<> and isa<> instead, which are LLVM-style RTTI. But for this assignment, helper functions provided would be mostly sufficient to finish the task.

It turns out that this problem really only bites hard in the rule for if-then-else. First we'll describe how to handle everything besides if-then-else. When you call sub(t1,t2), the function should return true if t1 is a subtype of t2 and return false otherwise. The subtyping rules in the Fun language definition are nondeterministic and cannot be implemented directly primarily because of the transitivity rule. The transitivity rule says:

t1 < t2 t2 < t3

------------------------ (transitivity)

t1 < t3

If we were trying to read this rule bottom-up like a function then it would say t1 is a subtype of t3 if we can magically conjure up some type t2 such that t1 is a subtype of t2 and t2 is a subtype of t3. Fortunately, if restructure our other subtyping rules a little bit to make sure that overall our subtype in relation is transitive, we don't really need to implement the transitivity rule directly. In particular, this means it will be necessary to combine the rules for tuples and check all of the conditions for subtyping on tuples at once.

The next part of the algorithm involves implementing the type checking algorithm for expressions themselves. This algorithm will invoke sub when necessary. Notice that in the declarative rules there is a troublesome rule called subsumption:

G |-- e : t t < t'

-------------------------

G |-- e : t'

Implementing this rule directly in the function visit is problematic because if we try to read this rule as a function bottom-up in which the arguments to the function are the context G, the expression e and the type t' then we will somehow have to "guess" what type t to recursively invoke visit with. Moreover, the subsumption rule can always be used. A stupid type checker might try to check subsumption over and over again and go into an infinite loop.

Instead, what we will do is call visit(exp) and expect it either to fail (by using TypeChecker::reportError()) or to succeed and return the least type t that e can possibly have in the current context TypeChecker::ctxt.

If the type-checker reports an error, it shouldn't necessarily give up. It should try to recover so that it can report other errors to the user. One way to recover is just pretend that the erroneous expression has type int and hope for the best.

The only places that we will use subtyping in our algorithm is when we absolutely have to (or else type checking will fail). For instance, one place that we might need to use subtyping is when we have a function call f (e). If f has type t1 -> t2 and e has type t1' then our type checker should succeed if t1' is a subtype of t1. Then, the type of f(e) can be any supertype of t2; in order to report the least supertype of t2, then just return t2 as the result of visit(“f(e)”).

Note: The type-checker needs to maintain a context; it needs to have a way of binding and freeing symbols in the context. Context class is provided to serve this purpose in include/Util/Context.h. Context class itself is a general class template to store keys and values of any types, but you will notice there is an instantiation of this template in TypeChecker class, as follows:

Context<std::string, MyType*> ctxt;

You can use functions from Context class (bind, undoOne, …) to manage context in the TypeChecker class.

The most difficult problem is the type-checking of if-then-else. Consider this Fun expression, in a context where f has type int->int:

if x then <0, f> else <1,2,3>

This expression is guaranteed to return a record whose first field is an int. That is, the least supertype of all the values that could be returned by this if-expression is <int>. Below we will describe the high-tech solution to this problem. Until you get everything else working, make the following approximation: the then-clause and the else-clause must have exactly the same type, that is,

join(t1,t2) = if (t1=t2) then return t1; else (reportError("t1 and t2 do not join"); return t1;)

This is not faithful to the Fun semantics; it will reject some legal Fun programs. However, it will not accept any illegal Fun programs. (This algorithm is sound but not complete.)

As usual, compile by invoking ‘make’. You can test your code by running tchecker tool:

cd $COS320_SRC_ROOT
make
./Debug+Asserts/bin/tchecker FUN_FILE

Make sure you remove all the warnings and you don’t have any testing printf statements uncommented left in the code, because they will result in deductions.

Checklist:

Did you check that main is int->int?
You should allow such silliness as (fun f(x:int):<int>=let x=3 in let x=<x> in x). That is, a "let" binding is permitted to hide an outer binding of the same name in a surrounding context.
You should even permit (fun g(int:<int>): <int> = int). That is, the name space of types (which in basic Fun has only "int" in it) is orthogonal to the name space of values.
Don't just give up at the first type error. Try to recover so that other errors can be found. The typical way to do this is to call ErrorMsg.error, then return the integer type, or the Tuple<> type, or whatever.

After you have gotten everything else working technically, implement accurate type-checking for if-then-else. See the note below.

After you have gotten everything working technically, and as time permits, improve the informativeness of your error messages. 10% of the points will be given on the basis of the subjective usefulness of the type-error messages.

When you specify line and column number locations in reportError(), try to make them as specific as possible. For example, when the type of an argument in a function call is incorrect, report not the whole function call expression’s location but the argument’s location.
Don’t include a newline character(\n) in your error messages.

After you have done all the above, make incremental changes that increase the elegance, cleanliness, and readability of your program.

Submit a README, as usual, and TypeChecker.cpp and TypeChecker.h, to dropbox here. In case you want to submit modified versions of fun.l or fun.y, you can use the "Upload optional file" feature. We also welcome test case FUN files as in the previous assignments.

Tips and Tricks

Least common supertype

To properly implement the type-checking of if-then-else, you need to implement an algorithm that computes the least common supertype of two types, also called the join of two types.

When you call join(t1,t2) it should return a third type t3 such that t1 is a subtype of t3 and t2 is also a subtype of t3. If no such type exists, use reportError() to report an error. Furthermore, join should return the "least" type t3. For example,

join (<int, (int->int)>, <int>) = <int>

Even though <> is a supertype of both <int,(int->int)> and <int>, it is also a supertype of <int>. Therefore, the join function should return <int> not <>. <int> is the least supertype of both <int,bool> and <int> and therefore it is a "join." Hint: there is a substantial twist when implementing the join function for function types. You'll need to implement a second, mutually recursive helper function in this case, to compute the greatest common subtype, called the "meet".

For debugging, using gdb is essential when you are programming in C or C++. If you haven’t used it, this is not that different from Java eclipse debugger or other GUI-based debuggers; you only need to know a few commands. This guide can be a good first step if you haven’t used gdb. Note that our code is already being compiled in debug setting (i.e., without optimizations and with debug symbols), you don’t need to change anything in the Makefile.

Available from: Friday, 26 February 2015

Due date: Monday, 23 March 2015, midnight

COS320, Spring 2015: Compiling Techniques

You are here

Type Checker

Tips and Tricks