4 Implementation Details of Tiger Type Checker
So I wrapped my Tiger type checker assignment. It’s 973 lines of code as of first submission. And it’s not even styled in the most readable way. But anyway, I hope only minor modification is needed to the code in the future.
Prof. Appel is certainly quite succinct when he discusses about the type checker, albeit he does in general a good job of pointing a starting point of writing the code. I’d like to add a few comments below after actually writing the code.
Oh, appendix A is awesome. Short but not missing detail. Make sure you have read it many times before writing the code. When in doubt, refer to appendix A.
Here are the 4 details of writing the type checker. I believe the use of T.NAME is not explained sufficiently in the textbook, so I added my comments below:
Strucutre of Sement.sml
- transty: translate a type declaration sentence into T.ty
- tranexp: deal with expressions and var
- g: deal with expressions
- h: deal with var
- transdec: add a block of declarations (var, func, type) into env/tenv
- transdecs: deal with a list of block of declarations.
- transprog: program entrance.
T.NAME
T.NAME is created during type decalration in transdec
. In a sequence of mutual recursive
declarations, types declared in an earlier line may not see types declared
later. So we do a preprocess and insert a placeholder for all the types
before hand. This placeholder is T.NAME.
T.NAME is made up of two parts: a symbol and a T.ty option ref. The symbol is simply the name of the type. The latter refers to the actual type of this binding.
During declaration, all type headers are added to the type environment with
T.NAME(“type_name”, ref NONE). In a second pass, the actual type of the bindings
are determined with transty
and the bindings are set. In the thrid pass,
T.NAME are striped away with actual_ty
and the binding is added to tenv.
Notice after transdec
, there should be NO T.NAME entries in the tenv.
But T.NAME may still show up in RECORD/ARRAY entries in tenv. Example:
type a={x:b}
type b=array of a
The final tenv entries:
a --> T.RECORD([T.NAME("b", ref b)], u1)
b --> T.ARRAY(T.NAME("a", ref a), u2)
T.NAME still exists! But that’s fine, because:
- the entries in tenv is not T.NAME.
- we have the correct type name where that field/element type is bound to.
So, where we might run into accessing a field/element type?
g(RecordExp)
/g(ArrayExp)
<— for actual record/array construction.h(FieldVar)
andh(SubscriptVar)
<— for retriving the type of field/array element.
When we run into NAME in these functions, simply look up the true type from tenv with its name!
To conclude, these methods need to deal with NAME:
transdecs(tydecs)
: where T.NAME is born, set, and striped. :Dtransty(RecordTy, ArrayTy)
: it is used in the middle of transdecs, so totally valid for T.NAME to show up. However it is OPTIONAL to deal with T.NAME here. Because the tenv symbol table is incomplete.actual_ty
: recursively dive into the ref part when it sees a T.NAME until it cannot do so.RecordExp/ArrayExp
: T.NAME will show up when looking at the field of a RECORD/ element type of an ARRAY. Simply look it up from tenv.h(FieldVar/SubscriptVar)
: similar to above.
In the latter two types of methods, make sure the final return type is anything but T.NAME so that it will never show up anywhere else and plague the code.
Cycle declarations
Another use of T.NAME is to detect cycles. Assuming we have a cycle:
type a = b;
type b = a;
Before striping the T.NAME in the thrid pass of transdec
, in tenv:
a --> T.NAME(a, ref b)
b --> T.NAME(b, ref a)
actual_ty
will recursively dive into the ref part and strips away the T.NAME
wrap. Notice it looks just like a graph traversal, each T.NAME is an edge. To
detect cycle, simply record all nodes in a visited array, and check if current node
is in that array.
Assignment to loop control variables | Check whether break is in loop
So this is where we run into the limit of a hash table based symbol table and functional way to maintain these tables. In every handling of the environment, we only have access to the current symbol table. But to perform these checks, we need to traverse upstream in the scopes.
We maintain a stack to do so, each stack entry has a scope name where this
scope belongs to, and a var list of what varaible is declared in this scope.
Those scope that may introduce variable masking, and is relavent to loops are
FOR
,WHILE
,FUNCTION
,LET
.
We maintain the integrity of the stack in a functional manner. So we need to change the header of several methods to allow the stack as an argument.
Specifically:
- In
transexp
A.ForExp
,A.WhileExp
,A.LetExp
will push a new entry to the stackA.BreakExp
andA.AssignExp
need to check the stack.
- In
transdec
Vardec
should add all declared var ids to the stack top var list.Funcdec
should also add func ids to stack top var list. For each fundec, need to push a new entry to the stack and all the function arguments to it. The function body is then evaluated with the new stack. The Funcdec itself should only return the stack containing the func ids.- These methods should return the augmented envstack.
- Typedec is not affected since no venv ids are involved.
- In transdecs
- When previous dec block returns the augmented stack, it should be passed to the next dec block.
- In transprog
- Use
nil
as initial stack.
- Use
Duplicate func/type dec
For every block of declists, we preprocess them and only keep the FIRST appearence of the ID and ignore all other duplicate declarations.