CS 445 – Homework 3 solution




5/5 - (5 votes)

1. The problem
In this assignment we will type expressions in the abstract syntax tree (AST) and start to perform
semantic analysis and error generation. We will also check to ensure that variables are declared before
being used and be able to warn when a variable might not be initialized before being used.
1.1 New Compiler Options
For this assignment your compiler will handle five command-line options. Some of these are repeats
from the last assignment. The following are the command-line options that your compiler will handle:
1. The -d option which sets the value of the yydebug variable to 1. This is a repeat from the last
2. The -D option which turns on symbol table debugging. Enabling this option will cause a line of
information to be printed for each action that your compiler performs with the symbol table. If
you are using the provided symbolTable.cpp code, just enable debugging in the SymbolTable
instance that your compiler creates.
3. The -p option which is a repeat from the last assignment.
4. The -P option which prints the AST with type information added. This information is printed
for symbols at their declaration and their use. The symbolTable.cpp code will assist with this
5. The -h option which will print the following usage message:
usage: -c [options] [sourcefile]
-d – turn on parser debugging
-D – turn on symbol table debugging
-h – print this usage message
-p – print the abstract syntax tree
-P – print the abstract syntax tree plus type information
Your c- must continue to accept a single input file from either a file name provided on the commandline or redirected from standard input.
1.2 Reorganize Your Code
Put your semantic analysis code in semantic.cpp and semantic.h. If you use the ourgetopt code,
place that in ourgetopt.cpp and ourgetopt.h. Update your makefile to build these targets by putting
their .o files in the dependency list for building c- and the line used to build c- with g++. You may find
it useful and educational to put your main() and associated routines in their own main.cpp file if you
haven’t done so already. This is not required.
1.3 Semantic Errors
We want to generate errors that are tagged with line numbers that are useful to the user. In order to
accomplish this, you will need to ensure that each node is tagged with the correct line number. To do
this effectively you will need to grab the line number as soon as possible (in flex) and associate it with
a given token. This can be done nicely (and portably) by passing a struct/class for each token from the
scanner to the parser in the yylval. This should contain all the information about the token such as the
line number, lexeme, constant value, and token class. You are probably already doing this. Recall that
you should avoid using pointers to global yy variables for token information because the parser
generated by yacc looks ahead and will possibly overwrite the variable that you are pointing to with
new information.
Once the information is passed in some form of struct/class, this information can then be stored in
nodes within your AST. This information can then used when traversing the AST to perform semantic
1.4 Scope and Type Checking
After checking if you should print the AST (just like in the last assignment) you will now traverse the
AST and search for typing and program structure errors. Your new main() might look something like
numErrors = 0;
numWarnings = 0;
if (numErrors == 0)
// check -p option

if (printSyntaxTree)
// print only info for declarations
printTree(syntaxTree, NOTYPES);
symbolTable = new SymbolTable; // instantiate the symbol table

// perform semantic analysis (may find errors when doing this)
semanticAnalysis(syntaxTree, symbolTable);
// check -P option
if (printAnnotatedSyntaxTree)
// print type info for all types
printTree(syntaxTree, TYPES);
// code generation will eventually go here…

// report number of errors and warnings
printf(“Number of errors: %d\n”, numErrors);
printf(“Number of warnings: %d\n”, numWarnings);
Or your main() might look quite different. You will likely want to perform more initialization before
you call yyparse(). The semanticAnalysis routine will process the AST by calling a treeTraverse
routine that will start at the root node of the AST and recursively call itself for children and siblings
until it reaches the leaf nodes in the tree. Declarations encountered during this process will make
entries in the symbol table (the symbol table is is discussed below) and references to symbols will be
checked by looking up the symbol name in the symbol table. Your goal in writing the treeTraverse
routine is to catch a variety of warnings and errors that are indicated in the example output files for this
assignment. You will need to duplicate the output presented in these example output files exactly for
each corresponding input file.
You should keep track of the number of errors and warnings and print them at the end of a run. The list
of errors and warnings that your c- should emit are shown below as format specifiers in printf format.
“ERROR(%d): ‘%s’ is a simple variable and cannot be called.\n”
“ERROR(%d): ‘%s’ requires both operands be arrays or not but lhs is%s an array and
rhs is%s an array.\n”
“ERROR(%d): ‘%s’ requires operands of %s but lhs is of %s.\n”
“ERROR(%d): ‘%s’ requires operands of %s but rhs is of %s.\n”
“ERROR(%d): ‘%s’ requires operands of the same type but lhs is %s and rhs is %s.\n”
“ERROR(%d): Array ‘%s’ should be indexed by type int but got %s.\n”
“ERROR(%d): Array index is the unindexed array ‘%s’.\n”
“ERROR(%d): Cannot index nonarray ‘%s’.\n”
“ERROR(%d): Cannot return an array.\n”
“ERROR(%d): Cannot use function ‘%s’ as a variable.\n”
“ERROR(%d): Symbol ‘%s’ is already declared at line %d.\n”
“ERROR(%d): Symbol ‘%s’ is not declared.\n”
“ERROR(%d): The operation ‘%s’ does not work with arrays.\n”
“ERROR(%d): The operation ‘%s’ only works with arrays.\n”
“ERROR(%d): Unary ‘%s’ requires an operand of %s but was given %s.\n”
“ERROR(LINKER): A function named ‘main()’ must be defined.\n”);
“WARNING(%d): The variable ‘%s’ seems not to be used.\n”
“WARNING(%d): Variable ‘%s’ may be uninitialized when used here.\n
To get a perfect match to the expected output it helps immensely if you just copy these format strings
and use them in your code. The type string that you replace in the message is often something like
“type int” or “unknown type”. The messages listed above are exactly the errors/warnings that you
must catch for this assignment. There are 16 error messages and 2 warning messages for this
assignment. In later assignments we will add more.
1.5 Type Checking for Specific AST Nodes
This section contains details regarding some errors that you should be checking for. This list is not
exhaustive. You can do whatever you like as long as your output matches the example output file
corresponding to a given input.
1.5.1 Declarations
For declarations, check to ensure that duplicate declarations do not exist. You can use the symbol table
to help with this. Recall that the arguments to a function in C- are contained in the outermost scope of
the function in which they appear (the first compound statement in the function definition). For
example, this C- code
int func(int x) begin int x; end
contains a duplicate declaration of the symbol x, whereas this C- code
int func(int x) begin begin int x; end end
does not contain a duplicate declaration since the innermost compound statement is a different scope
than the outermost compound statement. This is the way that C++ works.
1.5.2 Compound Statements
A new scope needs to be created for each compound statement. The symbolTable.cpp code that is
provided will permit you to created and destroy scopes, enter symbols in the symbol table, and check to
see if a symbol is already contained in the table. For documentation purposes, you can ascribe a label of
your choice to a given scope using the SymbolTable::enter method. For example, enter(“Compound
Statement”) will create a new scope with the label “Compound Statement” associated with it. If you
enable debugging in the symbol table, using a distinct identifier for each scope name may make the
debugging information more readable.
1.5.3 Assignments and Other Operators
Your code should check to ensure that types provided for assignments and other operators are correct.
The type information associated with an expression will have to be passed up the tree (synthetic
attributes) so that they can be checked by the operators that use it. Note that many operators have an
expected type that they produce. The + operation, for example, will produce a type int. Be sure to set
this sort of information as you traverse the tree. It might be useful to have an “undefined type” or
“placeholder type” for use when the type of an entity is undefined or is not yet known.
Consider using an array or clever function to determine what types are required for operands to specific
operators rather than using nested if statements. Use this same strategy to determine what type the
result of an operation will be. The number of cases required for type checking has been intentionally
limited, making it easier to design functions for this purpose. The types of all operands and return
types for all C- operators are contained in the C- grammar document on the course website. Note the
1. The = and != operators take operands that are the same type and produce a value of type
Boolean. The operands to both of these operators can be arrays.
2. The <- operator takes operands that are of the same type and produces the type of the lhs. This
means that the <- operator will produce the type of the lhs regardless of whether or not it is
undefined. This is because assignment in C- is an expression and can be used in cascaded
assignment like: a <- b <- c <- 314159265; (An intentional and interesting side note is that an
assignment is NOT a simpleExp! How does that effect what you can put in say… an if statement
predicate? Python does this sort of thing.)
3. The ++ and – – operators in C- take an integer operand and immediately produce an integer
result. This is different than the way that they behave in C/C++.
1.5.4 Identifiers
All C- identifiers must be unique, and variables must be declared before they are used. Your code must
check to see if a variable has been declared. If a referenced identifier has been used in a declaration,
set the type of the identifier node to the type of the declaration and associate a pointer to the declaration
node with the symbol table information for that identifier. This is very useful because it permits you to
quickly find your way to the declaration node of a given variable and allows you to store all of the
information about a given variable with its declaration. If the declaration associated with a given
identifier cannot be found, set the type of the identifier to an “unknown type” or some other indicator
that the type information is missing. To prevent cascading errors, undefined types do not provoke an
error when being compared with an expected type (e.g., int produced by the + operator). We assume
that an error was generated when a given identifier was marked as being of an unknown or undefined
type. For this assignment, each reference to an undeclared identifier must generate an error message.
This will likely be changed later.
For this assignment you must issue a warning for the possibility that a variable is being used before it
was initialized. This applies strictly to locally declared variables; not globals, statics, or parameters. If
the rhs value of a variable is used before (appears in the tree traversal before) it has been initialized or
appears on the lhs of an assignment, a warning must be issued. This means that an identifier being on
the lhs in a binary assignment or being initialized on declaration will cause the variable to be marked as
For identifiers, C- permits arrays that are indexed. An identifier used in this form becomes a non-array
type because it has been indexed. This means that type type of the AST ‘[‘ node is the type of the lhs
of the operation. Be sure to check for attempting to index an identifier that is a non-array type and
using identifiers of an array type in a non-indexed manner where this is illegal. Identifiers that are
arrays can be prefixed with the * operator to determine the number of items in the array. This operator
produces an int.
1.5.5 Functions
Functions in C- cannot return an entire array. Your c- must check for this and ensure that is not
permitted. Functions in C- can have a return type of void. This means that the specified function
cannot return a value. An implication of this is that it is possible for the type void to appear in an error
message. All C- programs must contain a function named main. If, after processing the entire tree
representing a program, there is no function named main in the symbol table, you must print an error
message of the form indicated earlier in this assignment.
1.6 The Symbol Table
A symbol table that you can use in your program is available on the course website. The version
provided uses the C++ standard template library. Feel free to augment it or to build your own. Though
the version provided uses the std::string type as the argument type in many places, you can convert
a const char * to and from a std::string. The SymbolTable::insert method allows you to
associate a pointer with a symbol in the table. This pointer could be used to associate a symbol name
with the tree node where the symbol was declared. The SymbolTable::enter and
SymbolTable::leave methods allow you to manage the stack of scopes associated with a program.
The scopes within a SymbolTable are always managed via these methods. The SymbolTable::insert
method will return false if you attempt to insert a symbol that has already been defined. The
SymbolTable::lookup method will return NULL if you attempt to look for a symbol that is not defined.
One feature of the symbol table is the debug flag. At construction time the SymbolTable object is in
non-debugging mode. But by setting the flag with the SymbolTable::debug method you can get the
symbol table to print out information. You can also just print the symbol table if using the provided
SymbolTable::print method. You might consider starting out by printing the symbol table when
exiting from a scope using the debug flag.
The SymbolTable::print method takes a print function that will print a tree node. So if you define
something to print the data in a node given a TreeNode* then you can supply the name of that print
function to the SymbolTable::print method to print out your symbol table stack. That way the code
doesn’t have to know what your TreeNode looks like internally. For instance
symtab = new SymbolTable();
creates the symbol table. If you have defined a function to print a tree node with a prototype like this:
void nodePrint(void *p)
You can print the symbol table by just doing this:
which will print each void * in the symbol table using your supplied nodePrint function.
2. Deliverables and Submission
Your homework will be submitted as an uncompressed .tar file that contains no subdirectories. Your .tar
file must contain all files and code necessary to build and test your compiler. If you use ourgetopt, be
sure to include the ourgetopt files in your .tar archive.

The .tar file is submitted to the class submission page. You can submit as many times as you like. The
last file you submit before the deadline will be the only one graded. No late submissions are
accepted for this assignment. For all submissions you will receive email at your uidaho address
showing how your file performed on the pre-grade tests. The grading program will use more extensive
tests, so thoroughly test your program with inputs of your own. The error messages that your c- emits
will be sorted before comparison with the expected output so that the order in which your messages are
printed is not as important as it otherwise would be.
Your code must compile successfully and have no shift/reduce or reduce/reduce conflicts from bison.
Your program must run with no runtime errors such as segmentation violations (SIGSEGVs).