CS 445 – Homework 1 to 7 solutions

$150.00

Original Work ?
Category: You will Instantly receive a download link for .ZIP solution file upon Payment

Description

5/5 - (1 vote)

CS 445 – Homework 1

1. The problem
Use a combination of Flex and Bison code as instructed in class to build and drive a scanner for the Cprogramming language. A grammar for C- is available on the course website. Your scanner will be
named c- (note the lowercase). That is, c- will ultimately be the compiler for the language C-. Your
scanner will read and process a stream of characters representing tokens from a file. The filename may
be given as an argument to the c- command, or the input can come from standard input if the filename
argument is not present. This means that the user can pass the C- code in file filename.c- to c- in
any of the three following ways:
$bash> c- filename.c-
$bash> cat filename.c- | c-
$bash> c- < filename.cTo get this to work will require that you be able to optionally read a file from the command line. You
will need to define arguments to main and use the yacc/bison variable named yyin to read from the file
that the user has specified. You should use fopen to get a FILE* that you then assign to yyin.
For this assignment you are to build a scanner using lex/flex and a driver for the scanner using
yacc/bison. We will be using both lex and yacc for all other assignments as well, so figuring out how
to get them to work together will be worth your effort. The machine cs-445.cs.uidaho.edu is available
for class use using your UI credentials. That is where the grading will occur and your homework
must compile and run on this machine. There are no exceptions to this.
1.1 The flex Part
Build a flex scanner that returns a token class for each token in the C- grammar. For numbers it should
also “return” a numerical value and the string that the user typed. For identifiers (ids) it should also
return a string. It should treat the Booleans “true” and “false” as keywords but internally treat them as
members of the token class BOOLCONST with values 0 and 1 for false and true respectively. The
scanner should ignore comments and whitespace characters.
Your scanner will generate an error if there is an illegal character in the input. This occurs when none
of the token patterns match the the current location in the input stream. See the example output for the
exact wording of the error message to be used in this case. No token will be returned in the case of this
error and scanning will continue.
Your scanner will generate an warning if a character constant is given with more than one character
(i.e. ‘dogs’). If this happens the first character will be used, the remaining ignored, and a warning will
be issued. See the example output for the exact wording of the error message to be used in this case.
Your scanner will generate an error if a character constant is given that contains no characters (i.e. ‘’).
No token will be returned and the token ignored. See the example output for the exact wording of the
error message to be used in this case.
Your scanner will keep track of the line number of each token encountered and return it with the token
and a string and and/or numeric representation of the token, if appropriate, in a struct or class instance.
This can be accomplished by using yylval to contain a pointer to a struct or class instance that you
create and want to return.
You may not use YYSTYPE for this assignment. Points will be deducted if you choose to ignore this
warning. Directly accessing YYSTYPE is considered poor programming practice. The flex and bison
tools provide the %union construct to facilitate defining a symbol type for yylval without stripping
away the insulation provided by YYSTYPE. Please use %union and not YYSTYPE.
1.2 The bison Part
Build a simple bison parser that accepts any stream of legal tokens from the scanner. You will have to
come up with the simple grammar for this. This is a grammar for just a stream of any legal tokens,
it is not the grammar for C-! This will serve as a “driver” to drive your scanner (the flex part). The
bison part prints out the line number, the token type, and any extra information returned by the scanner.
See example output to see what it should look like.
Your homework 1 program will just recognize tokens in C- and will not recognize grammatical
constructs of C-. One of the goals of this assignment is to get the basic build and communication
between flex and bison up and running. A template for constructing your bison file for this assignment
is contained on the course website.
1.3 The Test Data
Sample test inputs and their corresponding outputs are provided for this assignment. These are
available on the course website. You should examine these inputs and their corresponding outputs very
carefully and ensure that your program behaves precisely like these examples.
Character constants in test files may be the NUL character (‘\0’) and will therefore print in a way that
cannot be seen if you use a “%c” format string. This is expected. Be sure to use the “%c” format
string for all characters on this homework. Failure to do so will result in output that does not match the
examples, which will result in points being deducted from your homework.
Note in the provided test data that the class of any single special character token is printed as the
characters itself, and the class of any multi-character token is printed as an uppercase string. Follow
this example in your program to produce results that are the same as the provided examples.
2. Deliverables and Submission
Your homework will be submitted as an uncompressed .tar file that contains no subdirectories. Your .tar
file will contain at least the following items:
1. A file named parser.l that contains your flex code.
2. A file named parser.y that contains your bison code.
3. A file named scanType.h that contains the declaration of either a struct or class that is used to pass
your token information back from the scanner. This file must be #included in the right place in your
flex and bison files. An example of the declaration of a struct that can be used to pass token
information back from the scanner is shown below:
struct TokenData {
int tokenclass; // token class
int linenum; // line where found
char *tokenstr; // what string was actually read
char cvalue; // any character value
int nvalue; // any numeric value or Boolean value
char *svalue; // any string value e.g. an id
};
4. A makefile (note the all lowercase) that will be executed to build your c-. The name of this file must
be in all lowercase letters in order to work correctly on the test machine.
The uncompressed .tar file is submitted to the class submission page. You can submit as many times as
you like. The last file you submit before the deadline will be the only one graded. No late
submissions are accepted for this assignment. For all submissions you will receive email at your
uidaho address showing how your file performed on the pre-grade tests. The grading program will use
more extensive tests, so thoroughly test your program with inputs of your own.
Your code must compile successfully and have no shift/reduce or reduce/reduce conflicts in bison.
Your code must run with no runtime errors such as segmentation violations (SIGSEGVs).

CS 445 – Homework 2

1. The problem In the last assignment you built a scanner for C- in flex and a “driver” in bison that printed out information about the tokens produced by the scanner. For this assignment you are to modify the “driver” of the last assignment in order to produce a parser for C-. Your parser will construct the abstract syntax tree (AST) corresponding to the input C- program. Building your parser should consist of the following steps: 1. Build a C- recognizer 2. Extend your C- recognizer to add the AST 3. Add user interface options Your completed parser must produce the same AST as is specified in the provided examples. If your parser does not produce the same AST then your parser is flawed. 1.1 Building the Recognizer A recognizer for a particular programming language (PL) simply determines whether or not a given program is a member of the PL that is generated by the grammar for the PL. Use the C- grammar provided on the course website to build a grammar that is suitable for consumption by bison. You will need to fix the dangling-else ambiguity and any other ambiguities or associativity/precedence errors that are currently in the C- grammar. You may not use any bison directives such as %assoc, %left, %right, or %expect in fixing these errors. After you have learned to fix errors “by hand” you can venture into the realm of trying to figure out what the tools are doing to you when they are supposed to be fixing errors for you. But, for now, you are to fix any errors by hand. Once you have built a bison-ready C- grammar that contains no shift/reduce or reduce/reduce conflicts, you have successfully built a recognizer that needs to be tested. Test your recognizer by passing legal C- files into it and observing your recognizer. If your recognizer doesn’t emit a message indicating that there was an error (e.g. an unrecognized character or missing paren,) your recognizer recognized that the C- program is legal. If your recognizer emits an error message or behaves in an abnormal way (e.g. SIGSEGV) your grammar is incorrect and needs to be repaired. Once you have successfully tested your recognizer on all provided C- samples and are confident that it will recognize any legal C- program, it is time to extend the recognizer to build a parser. 1.2.1 Adding the AST Typically compilers build a modified form of syntax tree known as an abstract syntax tree (AST) instead of building a full parse tree for a given program. An AST is like a distilled form of parse tree that only contains the information that we need to continue the compilation process. Your parser is to construct an AST for any legal C- program. An AST is of course just a tree, and the tree will contain nodes representing information that was gathered about the input C- program. The following is a sample AST node structure that has been previously used in this course: typedef struct treeNode { // connectivity in the tree struct treeNode *child[MAXCHILDREN]; // children of the node struct treeNode *sibling; // siblings for the node // what kind of node int lineno; // linenum relevant to this node NodeKind nodekind; // type of this node union // subtype of type { DeclKind decl; // used when DeclK StmtKind stmt; // used when StmtK ExpKind exp; // used when ExpK } subkind; // extra properties about the node depending on type of the node union // relevant data to type -> attr { OpKind op; // type of token (same as in bison) int value; // used when an integer constant or boolean unsigned char cvalue // used when a character char *string; // used when a string constant char *name; // used when IdK } attr; ExpType expType; // used when ExpK for type checking bool isArray; // is this an array bool isStatic; // is staticly allocated? // even more semantic stuff will go here in later assignments. } TreeNode; This structure was pulled from the Louden textbook and is just a suggestion. You can use whatever you want, as long as you can reliably represent the necessary information in memory and can produce the same results as in the provided examples. The following type definitions are used by the TreeNode structure above, and were also pulled from the Louden textbook: // Kinds of Operators // these are the token numbers for the operators same as in flex typedef int OpKind; // Kinds of Statements //typedef enum {DeclK, StmtK, ExpK} NodeKind; enum NodeKind {DeclK, StmtK, ExpK}; // Subkinds of Declarations enum DeclKind {VarK, FuncK, ParamK}; // Subkinds of Statements enum StmtKind {NullK, IfK, WhileK, ForK, CompoundK, ReturnK, BreakK, RangeK}; // Subkinds of Expressions enum ExpKind {OpK, ConstantK, IdK, AssignK, InitK, CallK}; // ExpType is used for type checking (Void means no type or value, UndefinedType means undefined) enum ExpType {Void, Integer, Boolean, Char, CharInt, Equal, UndefinedType}; // What kind of scoping is used? (decided during typing) enum VarKind {None, Local, Global, Parameter, LocalStatic}; Because this design was borrowed from the Louden textbook, you are free to use the example from the Tiny language compiler in the book (a link for the code for this is on the course website) as a reference to study and build from. You are free to use a C++ class rather than a struct to represent a TreeNode. You can use any structure that you like as long as your C- parser produces the same output as in the provided examples. Building a correct AST is of critical importance because all future assignments will require successful construction and traversal of the AST corresponding to a given C- program. Any bugs that you produce in the construction and traversal of your AST are going to follow you for the remainder of the semester, so be Very Careful. To encode a C- program as an AST, you will need to construct the right nodes at the right times during the parsing process. When you need to construct a node, you should use functions that you build like the newStmtNode function in the util.c file for the Tiny language compiler in the book (a link for the code for this is on the course website). These nodes will be passed up the tree as pointers and assembled to build the AST like in the example for the Tiny language in the Louden textbook. The following is an example to use for specifying the set of types that can represent a yylval in your bison file (this was discussed in class). %union { ExpType type; // For passing types (i.e pass a type in a decl like int or bool) TokenData *tokenData; // For terminals. Token data comes from yylex() in the $ vars TreeNode * tree; // For nonterminals. Add these nodes as you build the tree. } This is just a suggestion of what could be in your %union. Note, however, that you are expressly forbidden from accessing YYSTYPE as is done in the Louden book. Using YYSTYPE directly subverts type checking facilities and is too risky for you in this course. You should, as mentioned earlier, build functions to construct different types of TreeNode that you want to put in your AST. The following are prototypes that can be used for these functions. If you examine the Tiny compiler code you will see that this is how TreeNode children are added to a given TreeNode in almost all cases (the initialization of a var is one exception). The use of default values for arguments in these prototypes simplifies the construction of a given TreeNode and also simplifies the corresponding code for the action in your bison file. TreeNode *newDeclNode(DeclKind kind, ExpType type, TokenData* token=NULL, TreeNode* c0=NULL, TreeNode* c1=NULL, TreeNode* c2=NULL); TreeNode *newStmtNode(StmtKind kind, TokenData* token, TreeNode* c0=NULL, TreeNode* c1=NULL, TreeNode* c2=NULL); TreeNode *newExpNode(ExpKind kind, TokenData* token, TreeNode* c0=NULL, TreeNode* c1=NULL, TreeNode* c2=NULL); Note: Once you have your parser working correctly, make sure that you modify your parser to rename the unary – (minus) operator to the CHSIGN token and rename the unary * (asterisk) operator to the SIZEOF token. There is no need to leave these operators untokenized after we can differentiate between their use in a unary or binary context. 1.2.2 Printing the AST Once you have successfully constructed an AST for a given program, you will need to print the AST in order to verify that your AST is correct. How do you know that your AST is correct? The structure of a correct AST for a given program is dictated by your bison grammar. You can also compare your printed AST with examples that are provided in the test dataset for this assignment. To print your AST, you should make a function in your code called printTree. Your printTree function must print out the AST precisely as it is printed in the provided examples. Almost the same format is not good enough. The printTree function will be called after the AST has been constructed by your bison-generated parser. Your main() function in your bison file will look something like this: int main(int argc, char* argv[]) { // some stuff here… // Call yyparse to build the AST and put its address in // the “global” var named syntaxTree of type TreeNode* yyparse(); if (printTreeFlag) // Print the contents of the AST to stdout printTree(stdout, syntaxTree); // other stuff here… } The printTreeFlag variable in the example above is one of the user interface options that you are going to add for this assignment, and is discussed later in this document. The printTree function prints the important information within a given node, and then recursively applies the printTree function to all of the non-NULL children of the node (starting with child[0]), and then recursively applies the printTree function to the sibling pointer if it is not NULL. The first sibling found is numbered 1. Reading the AST provided for given examples is a good way to figure out what to do. For example, given the following C- program as input: ## C-S22 int main() begin int x; int y[5]; bool b; b <- true or false and 11 + 22 * 33 / 44 > 55; if 666>777 then begin x <- 111; end else y[3] <- x+222; while x>999 do begin x <- 333; if x<9 then break; x <- 444; break; end for z <- 1 to 8 do if z>x then x <- z; end Your c- (the program containing the C- scanner, parser, and print routines) will scan the input, produce an AST, and print the following output: Func: main returns type int [line: 2] . Child: 1 Compound [line: 3] . . Child: 0 Var: x of type int [line: 4] . . Sibling: 1 Var: y of array of type int [line: 5] . . Sibling: 2 Var: b of type bool [line: 6] . . Child: 1 Assign: <- [line: 8] . . . Child: 0 Id: b [line: 8] . . . Child: 1 Op: or [line: 8] . . . . Child: 0 Const true [line: 8] . . . . Child: 1 Op: and [line: 8] . . . . . Child: 0 Const false [line: 8] . . . . . Child: 1 Op: > [line: 8] . . . . . . Child: 0 Op: + [line: 8] . . . . . . . Child: 0 Const 11 [line: 8] . . . . . . . Child: 1 Op: / [line: 8] . . . . . . . . Child: 0 Op: * [line: 8] . . . . . . . . . Child: 0 Const 22 [line: 8] . . . . . . . . . Child: 1 Const 33 [line: 8] . . . . . . . . Child: 1 Const 44 [line: 8] . . . . . . Child: 1 Const 55 [line: 8] . . Sibling: 1 If [line: 10] . . . Child: 0 Op: > [line: 10] . . . . Child: 0 Const 666 [line: 10] . . . . Child: 1 Const 777 [line: 10] . . . Child: 1 Compound [line: 10] . . . . Child: 1 Assign: <- [line: 11] . . . . . Child: 0 Id: x [line: 11] . . . . . Child: 1 Const 111 [line: 11] . . . Child: 2 Assign: <- [line: 13] . . . . Child: 0 Op: [ [line: 13] . . . . . Child: 0 Id: y [line: 13] . . . . . Child: 1 Const 3 [line: 13] . . . . Child: 1 Op: + [line: 13] . . . . . Child: 0 Id: x [line: 13] . . . . . Child: 1 Const 222 [line: 13] . . Sibling: 2 While [line: 15] . . . Child: 0 Op: > [line: 15] . . . . Child: 0 Id: x [line: 15] . . . . Child: 1 Const 999 [line: 15] . . . Child: 1 Compound [line: 15] . . . . Child: 1 Assign: <- [line: 16] . . . . . Child: 0 Id: x [line: 16] . . . . . Child: 1 Const 333 [line: 16] . . . . Sibling: 1 If [line: 17] . . . . . Child: 0 Op: < [line: 17] . . . . . . Child: 0 Id: x [line: 17] . . . . . . Child: 1 Const 9 [line: 17] . . . . . Child: 1 Break [line: 17] . . . . Sibling: 2 Assign: <- [line: 18] . . . . . Child: 0 Id: x [line: 18] . . . . . Child: 1 Const 444 [line: 18] . . . . Sibling: 3 Break [line: 19] . . Sibling: 3 For [line: 22] . . . Child: 0 Var: z of type int [line: 22] . . . Child: 1 Range [line: 22] . . . . Child: 0 Const 1 [line: 22] . . . . Child: 1 Const 8 [line: 22] . . . Child: 2 If [line: 22] . . . . Child: 0 Op: > [line: 22] . . . . . Child: 0 Id: z [line: 22] . . . . . Child: 1 Id: x [line: 22] . . . . Child: 1 Assign: <- [line: 22] . . . . . Child: 0 Id: x [line: 22] . . . . . Child: 1 Id: z [line: 22] Note that this output, including the dots to indicate indenting levels, is what your program output is going to be compared against. Your c- must produce this exact same output for this particular input. When building your AST, if there is an optional expression or statement that is absent, be sure to set the corresponding child pointer to NULL. For example, compound statements can contain declarations, but sometimes they don’t. If a compound statement doesn’t contain declarations, then the child[0] pointer for that compound statement should be set to NULL. As another example, a return statement in C- may or may not contain an expression. If a given return statement doesn’t contain an expression, the child[0] pointer corresponding to the TreeNode for the return statement should be set to NULL. The default value for all pointers, including unnecessary children and siblings, should always be NULL. This means that your code must set these pointers to NULL and consistently check the value of a pointer before attempting to dereference the pointer. The bison code in the Tiny compiler example source code (on the course website) is a good example of how to connect the TreeNode nodes that you construct. The calculator example that we showed in class is a good example of how to organize the interaction between the scanner (flex) and the parser (bison). The following are two examples of functions that can be used to connect sibling pointers and to pass information down a list of siblings. These functions also demonstrate the use of some of the type variables in struct TreeNode. // Adds a TreeNode to a list of siblings. // Adding a NULL node to the sibling list is probably a programming error! TreeNode *addSibling(TreeNode *t, TreeNode *s) { if (s==NULL) { printf(“ERROR(SYSTEM): never add a NULL to a sibling list.\n”); exit(1); } if (t!=NULL) { TreeNode *tmp; tmp = t; while (tmp->sibling!=NULL) tmp = tmp->sibling; tmp->sibling = s; return t; } return s; } // Passes the isStatic and type attributes down the sibling list. void setType(TreeNode *t, ExpType type, bool isStatic) { while (t) { t->expType = type; t->isStatic = isStatic; t = t->sibling; } } 1.3 User Interface Options Once you have successfully built your parser, you are ready to make changes to the interface of your cprogram, which is now starting to look like a compiler. Your program must be named c- (note the lowercase) as was the case in the last assignment. Your c- must be able to read the stream of tokens contained within a file whose name is passed as an argument on the command line, or to read the stream of tokens from standard input (stdin) if no argument is passed on the command line. This is the same behavior that was present in the last assignment. 1.3.1 The -p Option Your c- will always construct an AST for a legal C- program. If the user specifies the -p option on the command line, your c- will print the AST that it built to stdout. For example, if the user types the following command: $bash> c- -p mergeSort.cYour c- will build the AST for the program in mergeSort.c- and will print the AST to stdout. Whereas if the user types the following command: $bash> c- mergeSort.c- Your c- will build the AST for the program in mergeSort.c- but will not print the AST to stdout. The -p option will set a variable named printTreeFlag (as shown earlier in this assignment) so that you can determine whether or not to print the AST after it has been built. The -p option is the first of many options that we will be adding to the compiler interface during the semester. There is a Unix function called getopt that is used to handle command line arguments that you are free to use. A custom version called ourgetopt is provided on the course website. This custom version is much more portable than standard getopt implementations, and is provided so that you can reap the benefits of getopt while building your compiler on platforms other than Unix. 1.3.2 The -d Option Your c- will also take the -d option on the command line. This option enables debugging of the parser by setting the yydebug flag in bison to 1. For example if the user enters the following command: $bash> c- -d mergeSort.cYour -c will build an AST for the program in mergeSort.c and information about what is happening during the parsing of the file will be printed to stdout. The information that is printed will automatically be printed by the parser generated by bison once you set yydebug to 1. If the user doesn’t specify the -d option, no information about what is happening during parsing will be printed. In order to access the yydebug flag that is inside the parser generated by bison, be sure to declare it as an external integer (extern int yyedebug) inside your code. 2. Deliverables and Submission Your homework will be submitted as an uncompressed .tar file that contains no subdirectories. Your .tar file must contain all files and code necessary to build and test your compiler. If you use ourgetopt, be sure to include the ourgetopt files in your .tar archive. The .tar file is submitted to the class submission page. You can submit as many times as you like. The last file you submit before the deadline will be the only one graded. No late submissions are accepted for this assignment. For all submissions you will receive email at your uidaho address showing how your file performed on the pre-grade tests. The grading program will use more extensive tests, so thoroughly test your program with inputs of your own. Your code must compile successfully and have no shift/reduce or reduce/reduce conflicts from bison. Your program must run with no runtime errors such as segmentation violations (SIGSEGVs).

CS 445 – Homework 3

1. The problem
In this assignment we will type expressions in the abstract syntax tree (AST) and start to perform
semantic analysis and error generation. We will also check to ensure that variables are declared before
being used and be able to warn when a variable might not be initialized before being used.
1.1 New Compiler Options
For this assignment your compiler will handle five command-line options. Some of these are repeats
from the last assignment. The following are the command-line options that your compiler will handle:
1. The -d option which sets the value of the yydebug variable to 1. This is a repeat from the last
assignment.
2. The -D option which turns on symbol table debugging. Enabling this option will cause a line of
information to be printed for each action that your compiler performs with the symbol table. If
you are using the provided symbolTable.cpp code, just enable debugging in the SymbolTable
instance that your compiler creates.
3. The -p option which is a repeat from the last assignment.
4. The -P option which prints the AST with type information added. This information is printed
for symbols at their declaration and their use. The symbolTable.cpp code will assist with this
option.
5. The -h option which will print the following usage message:
usage: -c [options] [sourcefile]
options:
-d – turn on parser debugging
-D – turn on symbol table debugging
-h – print this usage message
-p – print the abstract syntax tree
-P – print the abstract syntax tree plus type information
Your c- must continue to accept a single input file from either a file name provided on the commandline or redirected from standard input.
1.2 Reorganize Your Code
Put your semantic analysis code in semantic.cpp and semantic.h. If you use the ourgetopt code,
place that in ourgetopt.cpp and ourgetopt.h. Update your makefile to build these targets by putting
their .o files in the dependency list for building c- and the line used to build c- with g++. You may find
it useful and educational to put your main() and associated routines in their own main.cpp file if you
haven’t done so already. This is not required.
1.3 Semantic Errors
We want to generate errors that are tagged with line numbers that are useful to the user. In order to
accomplish this, you will need to ensure that each node is tagged with the correct line number. To do
this effectively you will need to grab the line number as soon as possible (in flex) and associate it with
a given token. This can be done nicely (and portably) by passing a struct/class for each token from the
scanner to the parser in the yylval. This should contain all the information about the token such as the
line number, lexeme, constant value, and token class. You are probably already doing this. Recall that
you should avoid using pointers to global yy variables for token information because the parser
generated by yacc looks ahead and will possibly overwrite the variable that you are pointing to with
new information.
Once the information is passed in some form of struct/class, this information can then be stored in
nodes within your AST. This information can then used when traversing the AST to perform semantic
analysis.
1.4 Scope and Type Checking
After checking if you should print the AST (just like in the last assignment) you will now traverse the
AST and search for typing and program structure errors. Your new main() might look something like
this:
numErrors = 0;
numWarnings = 0;
yyparse();
if (numErrors == 0)
{
// check -p option

if (printSyntaxTree)
// print only info for declarations
printTree(syntaxTree, NOTYPES);
symbolTable = new SymbolTable; // instantiate the symbol table

// perform semantic analysis (may find errors when doing this)
semanticAnalysis(syntaxTree, symbolTable);
// check -P option
if (printAnnotatedSyntaxTree)
// print type info for all types
printTree(syntaxTree, TYPES);
// code generation will eventually go here…
}

// report number of errors and warnings
printf(“Number of errors: %d\n”, numErrors);
printf(“Number of warnings: %d\n”, numWarnings);
Or your main() might look quite different. You will likely want to perform more initialization before
you call yyparse(). The semanticAnalysis routine will process the AST by calling a treeTraverse
routine that will start at the root node of the AST and recursively call itself for children and siblings
until it reaches the leaf nodes in the tree. Declarations encountered during this process will make
entries in the symbol table (the symbol table is is discussed below) and references to symbols will be
checked by looking up the symbol name in the symbol table. Your goal in writing the treeTraverse
routine is to catch a variety of warnings and errors that are indicated in the example output files for this
assignment. You will need to duplicate the output presented in these example output files exactly for
each corresponding input file.
You should keep track of the number of errors and warnings and print them at the end of a run. The list
of errors and warnings that your c- should emit are shown below as format specifiers in printf format.
“ERROR(%d): ‘%s’ is a simple variable and cannot be called.\n”
“ERROR(%d): ‘%s’ requires both operands be arrays or not but lhs is%s an array and
rhs is%s an array.\n”
“ERROR(%d): ‘%s’ requires operands of %s but lhs is of %s.\n”
“ERROR(%d): ‘%s’ requires operands of %s but rhs is of %s.\n”
“ERROR(%d): ‘%s’ requires operands of the same type but lhs is %s and rhs is %s.\n”
“ERROR(%d): Array ‘%s’ should be indexed by type int but got %s.\n”
“ERROR(%d): Array index is the unindexed array ‘%s’.\n”
“ERROR(%d): Cannot index nonarray ‘%s’.\n”
“ERROR(%d): Cannot return an array.\n”
“ERROR(%d): Cannot use function ‘%s’ as a variable.\n”
“ERROR(%d): Symbol ‘%s’ is already declared at line %d.\n”
“ERROR(%d): Symbol ‘%s’ is not declared.\n”
“ERROR(%d): The operation ‘%s’ does not work with arrays.\n”
“ERROR(%d): The operation ‘%s’ only works with arrays.\n”
“ERROR(%d): Unary ‘%s’ requires an operand of %s but was given %s.\n”
“ERROR(LINKER): A function named ‘main()’ must be defined.\n”);
“WARNING(%d): The variable ‘%s’ seems not to be used.\n”
“WARNING(%d): Variable ‘%s’ may be uninitialized when used here.\n
To get a perfect match to the expected output it helps immensely if you just copy these format strings
and use them in your code. The type string that you replace in the message is often something like
“type int” or “unknown type”. The messages listed above are exactly the errors/warnings that you
must catch for this assignment. There are 16 error messages and 2 warning messages for this
assignment. In later assignments we will add more.
1.5 Type Checking for Specific AST Nodes
This section contains details regarding some errors that you should be checking for. This list is not
exhaustive. You can do whatever you like as long as your output matches the example output file
corresponding to a given input.
1.5.1 Declarations
For declarations, check to ensure that duplicate declarations do not exist. You can use the symbol table
to help with this. Recall that the arguments to a function in C- are contained in the outermost scope of
the function in which they appear (the first compound statement in the function definition). For
example, this C- code
int func(int x) begin int x; end
contains a duplicate declaration of the symbol x, whereas this C- code
int func(int x) begin begin int x; end end
does not contain a duplicate declaration since the innermost compound statement is a different scope
than the outermost compound statement. This is the way that C++ works.
1.5.2 Compound Statements
A new scope needs to be created for each compound statement. The symbolTable.cpp code that is
provided will permit you to created and destroy scopes, enter symbols in the symbol table, and check to
see if a symbol is already contained in the table. For documentation purposes, you can ascribe a label of
your choice to a given scope using the SymbolTable::enter method. For example, enter(“Compound
Statement”) will create a new scope with the label “Compound Statement” associated with it. If you
enable debugging in the symbol table, using a distinct identifier for each scope name may make the
debugging information more readable.
1.5.3 Assignments and Other Operators
Your code should check to ensure that types provided for assignments and other operators are correct.
The type information associated with an expression will have to be passed up the tree (synthetic
attributes) so that they can be checked by the operators that use it. Note that many operators have an
expected type that they produce. The + operation, for example, will produce a type int. Be sure to set
this sort of information as you traverse the tree. It might be useful to have an “undefined type” or
“placeholder type” for use when the type of an entity is undefined or is not yet known.
Consider using an array or clever function to determine what types are required for operands to specific
operators rather than using nested if statements. Use this same strategy to determine what type the
result of an operation will be. The number of cases required for type checking has been intentionally
limited, making it easier to design functions for this purpose. The types of all operands and return
types for all C- operators are contained in the C- grammar document on the course website. Note the
following:
1. The = and != operators take operands that are the same type and produce a value of type
Boolean. The operands to both of these operators can be arrays.
2. The <- operator takes operands that are of the same type and produces the type of the lhs. This
means that the <- operator will produce the type of the lhs regardless of whether or not it is
undefined. This is because assignment in C- is an expression and can be used in cascaded
assignment like: a <- b <- c <- 314159265; (An intentional and interesting side note is that an
assignment is NOT a simpleExp! How does that effect what you can put in say… an if statement
predicate? Python does this sort of thing.)
3. The ++ and – – operators in C- take an integer operand and immediately produce an integer
result. This is different than the way that they behave in C/C++.
1.5.4 Identifiers
All C- identifiers must be unique, and variables must be declared before they are used. Your code must
check to see if a variable has been declared. If a referenced identifier has been used in a declaration,
set the type of the identifier node to the type of the declaration and associate a pointer to the declaration
node with the symbol table information for that identifier. This is very useful because it permits you to
quickly find your way to the declaration node of a given variable and allows you to store all of the
information about a given variable with its declaration. If the declaration associated with a given
identifier cannot be found, set the type of the identifier to an “unknown type” or some other indicator
that the type information is missing. To prevent cascading errors, undefined types do not provoke an
error when being compared with an expected type (e.g., int produced by the + operator). We assume
that an error was generated when a given identifier was marked as being of an unknown or undefined
type. For this assignment, each reference to an undeclared identifier must generate an error message.
This will likely be changed later.
For this assignment you must issue a warning for the possibility that a variable is being used before it
was initialized. This applies strictly to locally declared variables; not globals, statics, or parameters. If
the rhs value of a variable is used before (appears in the tree traversal before) it has been initialized or
appears on the lhs of an assignment, a warning must be issued. This means that an identifier being on
the lhs in a binary assignment or being initialized on declaration will cause the variable to be marked as
initialized.
For identifiers, C- permits arrays that are indexed. An identifier used in this form becomes a non-array
type because it has been indexed. This means that type type of the AST ‘[‘ node is the type of the lhs
of the operation. Be sure to check for attempting to index an identifier that is a non-array type and
using identifiers of an array type in a non-indexed manner where this is illegal. Identifiers that are
arrays can be prefixed with the * operator to determine the number of items in the array. This operator
produces an int.
1.5.5 Functions
Functions in C- cannot return an entire array. Your c- must check for this and ensure that is not
permitted. Functions in C- can have a return type of void. This means that the specified function
cannot return a value. An implication of this is that it is possible for the type void to appear in an error
message. All C- programs must contain a function named main. If, after processing the entire tree
representing a program, there is no function named main in the symbol table, you must print an error
message of the form indicated earlier in this assignment.
1.6 The Symbol Table
A symbol table that you can use in your program is available on the course website. The version
provided uses the C++ standard template library. Feel free to augment it or to build your own. Though
the version provided uses the std::string type as the argument type in many places, you can convert
a const char * to and from a std::string. The SymbolTable::insert method allows you to
associate a pointer with a symbol in the table. This pointer could be used to associate a symbol name
with the tree node where the symbol was declared. The SymbolTable::enter and
SymbolTable::leave methods allow you to manage the stack of scopes associated with a program.
The scopes within a SymbolTable are always managed via these methods. The SymbolTable::insert
method will return false if you attempt to insert a symbol that has already been defined. The
SymbolTable::lookup method will return NULL if you attempt to look for a symbol that is not defined.
One feature of the symbol table is the debug flag. At construction time the SymbolTable object is in
non-debugging mode. But by setting the flag with the SymbolTable::debug method you can get the
symbol table to print out information. You can also just print the symbol table if using the provided
SymbolTable::print method. You might consider starting out by printing the symbol table when
exiting from a scope using the debug flag.
The SymbolTable::print method takes a print function that will print a tree node. So if you define
something to print the data in a node given a TreeNode* then you can supply the name of that print
function to the SymbolTable::print method to print out your symbol table stack. That way the code
doesn’t have to know what your TreeNode looks like internally. For instance
symtab = new SymbolTable();
creates the symbol table. If you have defined a function to print a tree node with a prototype like this:
void nodePrint(void *p)
You can print the symbol table by just doing this:
symtab->print(nodePrint);
which will print each void * in the symbol table using your supplied nodePrint function.
2. Deliverables and Submission
Your homework will be submitted as an uncompressed .tar file that contains no subdirectories. Your .tar
file must contain all files and code necessary to build and test your compiler. If you use ourgetopt, be
sure to include the ourgetopt files in your .tar archive.

The .tar file is submitted to the class submission page. You can submit as many times as you like. The
last file you submit before the deadline will be the only one graded. No late submissions are
accepted for this assignment. For all submissions you will receive email at your uidaho address
showing how your file performed on the pre-grade tests. The grading program will use more extensive
tests, so thoroughly test your program with inputs of your own. The error messages that your c- emits
will be sorted before comparison with the expected output so that the order in which your messages are
printed is not as important as it otherwise would be.
Your code must compile successfully and have no shift/reduce or reduce/reduce conflicts from bison.
Your program must run with no runtime errors such as segmentation violations (SIGSEGVs).

CS 445 – Homework 4

1. The problem
In this assignment we will continue semantic analysis and error generation. We will also add I/O runtime library support.
1.1 New Compiler Messages
For this assignment your compiler will detect more semantic errors/warnings and emit messages for
them. You will be graded on detecting all errors/warnings and emitting messages, including those that
are from the last homework, so be sure that you complete the semantic analysis and error message
generation that was included in the last homework. The list of all errors/warnings to detect and the
messages that your compiler will emit for them is shown below:
“ERROR(%d): ‘%s’ is a simple variable and cannot be called.\n”
“ERROR(%d): ‘%s’ requires both operands be arrays or not but lhs is%s an array and
rhs is%s an array.\n”
“ERROR(%d): ‘%s’ requires operands of %s but lhs is of %s.\n”
“ERROR(%d): ‘%s’ requires operands of %s but rhs is of %s.\n”
“ERROR(%d): ‘%s’ requires operands of the same type but lhs is %s and rhs is %s.\n”
“ERROR(%d): Array ‘%s’ should be indexed by type int but got %s.\n”
“ERROR(%d): Array index is the unindexed array ‘%s’.\n”
“ERROR(%d): Cannot have a break statement outside of loop.\n”
“ERROR(%d): Cannot index nonarray ‘%s’.\n”
“ERROR(%d): Cannot return an array.\n”
“ERROR(%d): Cannot use array as test condition in %s statement.\n”
“ERROR(%d): Cannot use array in position %d in range of for statement.\n”
“ERROR(%d): Cannot use function ‘%s’ as a variable.\n”
“ERROR(%d): Expecting %s in parameter %i of call to ‘%s’ declared on line %d but
got %s.\n”
“ERROR(%d): Expecting %s in position %d in range of for statement but got %s.\n”
“ERROR(%d): Expecting Boolean test condition in %s statement but got %s.\n”
“ERROR(%d): Expecting array in parameter %i of call to ‘%s’ declared on line %d.\n”
“ERROR(%d): Function ‘%s’ at line %d is expecting no return value, but return has a
value.\n”
“ERROR(%d): Function ‘%s’ at line %d is expecting to return %s but return has no
value.\n”
“ERROR(%d): Function ‘%s’ at line %d is expecting to return %s but returns %s.\n”
“ERROR(%d): Initializer for variable ‘%s’ is not a constant expression.\n”
“ERROR(%d): Initializer for variable ‘%s’ of %s is of %s\n”
“ERROR(%d): Initializer for variable ‘%s’ requires both operands be arrays or not
but variable is%s an array and rhs is%s an array.\n”
“ERROR(%d): Not expecting array in parameter %i of call to ‘%s’ declared on line
%d.\n”
“ERROR(%d): Symbol ‘%s’ is already declared at line %d.\n”
“ERROR(%d): Symbol ‘%s’ is not declared.\n”
“ERROR(%d): The operation ‘%s’ does not work with arrays.\n”
“ERROR(%d): The operation ‘%s’ only works with arrays.\n”
“ERROR(%d): Too few parameters passed for function ‘%s’ declared on line %d.\n”
“ERROR(%d): Too many parameters passed for function ‘%s’ declared on line %d.\n”
“ERROR(%d): Unary ‘%s’ requires an operand of %s but was given %s.\n”
“ERROR(LINKER): A function named ‘main’ with no parameters must be defined.\n”);
“WARNING(%d): Expecting to return %s but function ‘%s’ has no return statement.\n”
“WARNING(%d): The function ‘%s’ seems not to be used.\n”
“WARNING(%d): The parameter ‘%s’ seems not to be used.\n”
“WARNING(%d): The variable ‘%s’ seems not to be used.\n”
“WARNING(%d): Variable ‘%s’ may be uninitialized when used here.\n”
Note that the majority of these messages were contained in homework 3, so if you correctly completed
that homework you have less to accomplish for this homework. As with the last homework, your goal
is to replicate the output shown in the example files that are contained in the course website. You
should continue to keep a count of the number of errors and warnings that were encountered while
processing a given input.
1.2 Adding I/O Runtime Support
For this assignment you are to add nascent support for I/O runtime library routines that will be used in
C- programs. Since your compiler doesn’t generate code yet, you will just need to add logic to detect
semantic errors in C- code that attempts to use these I/O routines. Declarations (prototypes) for these
routines, presented in C- syntax, are shown below:
void output(int)
void outputb(bool)
void outputc(char)
int input()
int inputb()
int inputc()
void outnl()
In order to detect errors in C- code that refers to these routines, you will need to load information about
these routines into the symbol table before you parse any C- program that might use them. One way to
accomplish this is to build a separate AST containing just information about these routines, then
traverse this separate AST using your tree traversal routine to load the information into the symbol
table. Once this is accomplished, you can then traverse the AST of the full program using the same tree
traversal routine and the symbol table information loaded in the first scan will be available to be used
during semantic analysis of the full program. A graph of the AST containing the information about the
I/O routines is shown below.
In the graph above, all parameters are named “*dummy*”. When performing semantic analysis of Ccode that uses these routines, associate line number -1 with each of the I/O routines, and feel free to
continue to associate the “*dummy*” name with any formal parameters for these routines. Remember
that you are not generating code for these routines, you are just detecting semantic errors/warnings in
code that attempts to use these routines.
1.3 Semantic Checks
• All while and if statements must be checked to ensure that they contain predicates (or
conditions) that are of type bool.
• All for statements must be checked to ensure that all expressions used in a range are of type int.
• The types of all operators (including assignment) must be checked to ensure that all operands
are of the correct type. The types associated with expressions will have to be passed up to
parent nodes (synthesized attributes). This should have been accomplished in homework 3.
• The type of any expression used in a return statement must be checked to ensure that it
conforms with the return type of the function in which it appears.
• The number and types of all function arguments (actual parameters) must be checked to ensure
that they are correct. This involves comparing the number of formal parameters with the
number of actual parameters, and comparing the type of each formal parameter to the function
with the type of each corresponding actual parameter to ensure that the types match. To
accomplish this, you should use the symbol table to store/retrieve the TreeNode pointer
associated with a given function definition.
• Functions that return a value must be checked to determine that they contain a return
statement. It is much more difficult to determine whether all control paths in a function return a
value, so for now just check to determine whether or not the function contains any return
statement and issue a warning if none is detected. Functions that do not return a value can use
an explicit return statement with no value provided, or can just not use a return statement at
all.
• All break statements must be checked to ensure that they appear inside an iteration statement.
• All identifiers must be checked to see if the identifier has already been defined or not, and to
use the symbol table to set the type of the identifier TreeNode to the type of the declaration in
which the identifier appears. If the identifier is undefined, then use an “unknownType” or other
marker to indicate that the type of this identifier, if not matched in a surrounding expression in a
parent TreeNode, will not generate an error (this is to prevent cascading errors). Most of this
should already have been done for homework 3.

• Initializers must be checked to ensure that the value of an initializer is a constant.
• Indexed identifiers (arrays) must be checked to ensure that the identifier being indexed is indeed
an array. Also check to ensure that arrays without used without an index are only used when it
is legal to do so. This should have already been done for homework 3.
• Check to ensure that the unary * operator (sizeof) is only used with arrays.
• Check to ensure that all “global” symbols are used in the program. If a global symbol is not
referenced anywhere in a program, emit a warning of the specified format. Do not emit
warnings of this type for the main function (since most programs won’t call main directly) or
for any of the I/O runtime routines (since they are being added regardless of whether or not the
program refers to them).
2. Deliverables and Submission
Your homework will be submitted as an uncompressed .tar file that contains no subdirectories. Your .tar
file must contain all files and code necessary to build and test your compiler. If you use ourgetopt, be
sure to include the ourgetopt files in your .tar archive.

The .tar file is submitted to the class submission page. You can submit as many times as you like. The
last file you submit before the deadline will be the only one graded. No late submissions are
accepted for this assignment. For all submissions you will receive email at your uidaho address
showing how your file performed on the pre-grade tests. The grading program will use more extensive
tests, so thoroughly test your program with inputs of your own. The error messages that your c- emits
will be sorted before comparison with the expected output so that the order in which your messages are
printed is not as important as it otherwise would be.
Your code must compile successfully and have no shift/reduce or reduce/reduce conflicts from bison.
Your program must run with no runtime errors such as segmentation violations (SIGSEGVs).

CS 445 – Homework 5

1. The problem
In this assignment we will make all error and warning messages have the same format and add
modifications to keep syntax errors from halting syntactic analysis in the compiler. Lots of code is
provided for you to use, so you just need to fold in the code and adapt it to your grammar and token
names. There are several parts to this assignment:
1. Modify your program to improve error messages by using yyerror and error count.
2. Modify your grammar to use the bison error token and the yyerrok macro.
1.1 Improve Error Messages
Using yyerror permits us to catch all errors that are returned from the parser. Here are some examples
of error messages that might be passed into the yyerror function if YYERROR_VERBOSE is set:
syntax error, unexpected ‘+’
syntax error, unexpected ELSE
syntax error, unexpected ‘<-‘, expecting ‘(‘
syntax error, unexpected ‘<-‘, expecting WHILE
syntax error, unexpected ID, expecting ‘(‘
syntax error, unexpected ID, expecting WHILE
syntax error, unexpected ‘+’, expecting ‘,’ or ‘;’
syntax error, unexpected ‘/’, expecting BOOL or INT or VOID
syntax error, unexpected ID, expecting $end or BOOL or INT or VOID
For this part of the assignment we will add a line number, format the error message like it appears in
the syntactic analysis phase, and some extra information to the message where appropriate. The
yyerror.cpp and yyerror.h files for this assignment on the course website can be used to translate these
messages as specified. You can use this code as-is, or improve it if you want as long as your output is
the same as the provided examples. This code contains a tiny sort routine that sorts the expected tokens
so that a uniform error message can be printed regardless of the order of your token declarations in
your parser.y file. Note that this code uses names for tokens that may be different from the names that
your code is using, so you may need to adjust some token names to get the code to work.
If you aren’t enabling the YYERROR_VERBOSE macro in your yacc specification already, you need
to do so in order to get the yyerror.cpp code to work. You can just ust do a #define
YYERROR_VERBOSE in your parser.y file to get it to work. In the yerror.cpp code, the line symbol is
the line number where the parser is located when an error is encountered, and the msg symbol is the
extended error message that comes from the parser when your YYERROR_VERBOSE macro is
enabled.
Your compiler will continue to have two global variables to count the number of errors and warnings
that were encountered. In this assignment we will modify things so that the count of errors and
warnings also includes warnings and errors from the scanner and the syntax analyzer.
Your compiler now has three phases: lexical analysis (scanner), syntax analysis, and semantic analysis.
Each of these phases can produce its own errors. Below is a list of the list of the modifications that are
necessary for this assignment (some of these you may have already completed) for each phase:
• Lexical Analysis: All warnings and errors increment the appropriate counter.
• Syntax Analysis: You will be adding tokens to your bison grammar to allow the parser to match
an error, automatically revert to a “known good state” and continue parsing. You will also add
the yyerrok macro to synchronize as needed. A list of possible edits to your grammar is shown
in the grammarmods.txt document on the course website.
• Semantic Analysis: Your compiler should proceed to the semantic analysis phase only if there
are no errors encountered in the syntax analysis phase. The errors encountered in this phase
will increase the global error count.
Your compiler should continue to keep a running count of all warnings and errors from all phases and
report them at the end of compilation in the order and format shown below:
Number of warnings: 2
Number of errors: 496
1.2 Modifying Your Grammar
For this assignment you will need to modify your grammar to add the error token so that semantic
analysis can continue past errors. This is done to help the user get as much information as possible
about their program in a single compilation. A list of possible edits to your grammar is shown in the
grammarmods.txt file on the course website. The edits are shown in the order in which they appear in
my grammar, which is very similar to the standard C- grammar. In some cases, only the beginning of
the actions associated with a given production are shown. Sometimes the only thing that is added in
this list is the addition of the macro yyerrok, and this means that the yyerrok macro should be used
somewhere in your actions for the corresponding production. Your token names will likely be different
than the ones that are used, but these names will be translated into “nice” names so that your
implementation names are hidden. Some of the productions in your grammar may be slightly different
and that is fine. On the final evaluation of this assignment you will be graded on the actual error
messages that your compiler generates, not where you place your error tokens. Try to come as close
as possible to the output that is provided in the examples. The text of the errors should be exactly the
same. You may miss 10% of the messages or have some extra messages and your score will not be
reduced because of it. Just make sure that the text of the error messages matches exactly.
Once you have added the error token to your grammar bison will report many shift/reduce and
reduce/reduce conflicts. This is expected. Your compiler should not halt due to a syntax error.
2. Deliverables and Submission
Your homework will be submitted as an uncompressed .tar file that contains no subdirectories. Your .tar
file must contain all files and code necessary to build and test your compiler. If you use ourgetopt, be
sure to include the ourgetopt files in your .tar archive.

The .tar file is submitted to the class submission page. You can submit as many times as you like. The
last file you submit before the deadline will be the only one graded. No late submissions are
accepted for this assignment. For all submissions you will receive email at your uidaho address
showing how your file performed on the pre-grade tests. The grading program will use more extensive
tests, so thoroughly test your program with inputs of your own. The error messages that your c- emits
will be sorted before comparison with the expected output so that the order in which your messages are
printed is not as important as it otherwise would be.
Your program must run with no runtime errors such as segmentation violations (SIGSEGVs).

CS 445 – Homework 6

1. The problem
In this assignment we will be preparing to generate code. Your compiler will need to compute the
scope, size, and location of each symbol in an input program. You will add a new -M option to your
compiler to print this information that your compiler has computed. The format that it will be printed
in is shown later in this document.
1.1 Suggested Parser Changes
Your compiler should record the size of each variable encountered at parse time. The size of most
variables is 1 “data word” except for arrays. The size of arrays is the number of items contained in the
array plus 1 data word for holding the size. As a reminder, the data word containing the size of the
array is held in a location before the first element of the array. This means that the size of an array x is
effectively held in location x[-1]. Since arrays are passed by reference, the size of any array passed as a
function argument is 1 data word.
1.2 Suggested Semantic Analysis Changes
Recall that the application binary interface (ABI) that we are using utilizes register R0 to hold the
global pointer and register R1 to hold the current frame pointer at run-time. Your compiler should use
two variables, foffset and goffset, to compute and update these values at compile-time in order to
keep track of the location of symbols within the current input program. You should modify your
TreeNode declaration to contain information about the size, location, and reference type (local, global,
static, or parameter) of each symbol in the input program. The location of a symbol will be the offset
from the appropriate pointer (global or frame) where memory is allocated for the symbol. This offset
should be readily available in the variables foffset and goffset. You should use the symbols table
to store this new information that is associated with each symbol. Once you do this, you will be able to
use the symbols table to determine the location (offset), size, and reference type of all references to a
variable when you encounter them.
You will need to reset the local frame offset foffset each time that a new function is entered so that
the symbols for a particular function are offset from the top of the frame for that particular function.
This will permit re-use of frame locations for different function calls. You should also be sure to leave
room for the “return ticket” (the return frame pointer and the return address) that is located at the top of
the frame for each called function. The result will be that each declaration that requires space will store
the size of that space and the offset (location) of that space in the TreeNode associated with the
declaration. The “reference type” field in your TreeNode will help in deciding where to allocate space
and how to reference that space when generating code. The “reference type” field will hold one of the
four values (local, global, static, or parameter) that will specify which offset register (R0 or R1) to use
at code generation time.
Constants typically don’t require that memory be allocated, as they are often just loaded from
instruction memory (as discussed using the LDC instruction). In the case of constant arrays, however,
you will need to allocate space for them in the global area. The current language only permits constant
arrays of characters, so these will just be global character arrays, and their address is the address of the
first character in the array. The size of these constant arrays is also stored in a data word immediately
before the first item in the array.
Function symbols are global. The offset location of functions will be used later for storing the address
of the first instruction in the function so that we can look up a function in the symbols table and jump
to it when needed. Function offsets should be initialized to zero.
It is assumed that the foffset value will be saved when entering a new scope and restored when
exiting a scope so that frame memory can be reclaimed as discussed in lectures. Be sure that your code
does this. If you don’t get this right, many of your locations will be incorrect.
2. An Example
Your code will be tested on the various inputs provided for this assignment. Each input file will be fed
to your compiler with the command line “c- -M filename” where the -M option is specified to print
an augmented containing tree information about the location and size of symbols. An example of this
is shown below.
2.1 Example Input File
The following is an example C- input file. The lines in this file are numbered for reference. Comments
contained in the file specify the location of symbols and are also for your reference.
1 int g; // loc G 0
2 char h[10]:”dogs”; // loc G -2
3
4 beachDay(int p, q[]) // loc L -2, -3
5 {
6 int a:17; // loc L -4
7 bool b[10]; // loc L -6
8 static int s; // loc G -12
9 static int t[10]; // loc G -14
10 static char u[10]:”corgi”; // loc G -25
11 int z; // loc L -16
12 a;
13 b;
14 s;
15 t;
16 u;
17 666;
18 ‘x’;
19 “purple”;
20 true;
21 }
22
23 main() {
24 int a; // loc L -2
25 {
26 int b, c, d; // loc L -3, -4, -5
27 }
28 {
29 int e, f; // loc L -3, -4
30 }
31 for a <- 1 to 10 do a; // loc L -3
32
33 }
34 // next free space in globals is -35
2.2 Example Output Tree
The following is an example of the augmented tree that will be printed if the input file shown above is
handed to your compiler with the -M command line option. Output lines that are too long to fit within
the dimensions of this document wrap around. Your output lines should not wrap around.
WARNING(13): Variable ‘b’ may be uninitialized when used here.
WARNING(4): The parameter ‘p’ seems not to be used.
WARNING(4): The parameter ‘q’ seems not to be used.
WARNING(11): The variable ‘z’ seems not to be used.
WARNING(26): The variable ‘b’ seems not to be used.
WARNING(26): The variable ‘c’ seems not to be used.
WARNING(26): The variable ‘d’ seems not to be used.
WARNING(29): The variable ‘e’ seems not to be used.
WARNING(29): The variable ‘f’ seems not to be used.
WARNING(24): The variable ‘a’ seems not to be used.
WARNING(4): The function ‘beachDay’ seems not to be used.
WARNING(1): The variable ‘g’ seems not to be used.
WARNING(2): The variable ‘h’ seems not to be used.
Var: g of type int [mem: Global loc: 0 size: 1] [line: 1]
Sibling: 1 Var: h of array of type char [mem: Global loc: -7 size: 11] [line: 2]
. Child: 0 Const “dogs” of array of type char [mem: Global loc: -2 size: 5] [line: 2]
Sibling: 2 Func: beachDay returns type void [mem: Global loc: 0 size: -4] [line: 4]
. Child: 0 Parm: p of type int [mem: Parameter loc: -2 size: 1] [line: 4]
. Sibling: 1 Parm: q of array of type int [mem: Parameter loc: -3 size: 1] [line: 4]
. Child: 1 Compound [mem: None loc: 0 size: -17] [line: 5]
. . Child: 0 Var: a of type int [mem: Local loc: -4 size: 1] [line: 6]
. . . Child: 0 Const 17 of type int [line: 6]
. . Sibling: 1 Var: b of array of type bool [mem: Local loc: -6 size: 11] [line: 7]
. . Sibling: 2 Var: s of static type int [mem: LocalStatic loc: -17 size: 1] [line: 8]
. . Sibling: 3 Var: t of static array of type int [mem: LocalStatic loc: -19 size: 11]
[line: 9]
. . Sibling: 4 Var: u of static array of type char [mem: LocalStatic loc: -36 size: 11]
[line: 10]
. . . Child: 0 Const “corgi” of array of type char [mem: Global loc: -30 size: 6]
[line: 10]
. . Sibling: 5 Var: z of type int [mem: Local loc: -16 size: 1] [line: 11]
. . Child: 1 Id: a of type int [mem: Local loc: -4 size: 1] [line: 12]
. . Sibling: 1 Id: b of array of type bool [mem: Local loc: -6 size: 11] [line: 13]
. . Sibling: 2 Id: s of static type int [mem: LocalStatic loc: -17 size: 1] [line: 14]
. . Sibling: 3 Id: t of static array of type int [mem: LocalStatic loc: -19 size: 11]
[line: 15]
. . Sibling: 4 Id: u of static array of type char [mem: LocalStatic loc: -36 size: 11]
[line: 16]
. . Sibling: 5 Const 666 of type int [line: 17]
. . Sibling: 6 Const ‘x’ of type char [line: 18]
. . Sibling: 7 Const “purple” of array of type char [mem: Global loc: -47 size: 7]
[line: 19]
. . Sibling: 8 Const true of type bool [line: 20]
Sibling: 3 Func: main returns type void [mem: Global loc: 0 size: -2] [line: 23]
. Child: 1 Compound [mem: None loc: 0 size: -3] [line: 23]
. . Child: 0 Var: a of type int [mem: Local loc: -2 size: 1] [line: 24]
. . Child: 1 Compound [mem: None loc: 0 size: -6] [line: 25]
. . . Child: 0 Var: b of type int [mem: Local loc: -3 size: 1] [line: 26]
. . . Sibling: 1 Var: c of type int [mem: Local loc: -4 size: 1] [line: 26]
. . . Sibling: 2 Var: d of type int [mem: Local loc: -5 size: 1] [line: 26]
. . Sibling: 1 Compound [mem: None loc: 0 size: -5] [line: 28]
. . . Child: 0 Var: e of type int [mem: Local loc: -3 size: 1] [line: 29]
. . . Sibling: 1 Var: f of type int [mem: Local loc: -4 size: 1] [line: 29]
. . Sibling: 2 For [mem: None loc: 0 size: -4] [line: 31]
. . . Child: 0 Var: a of type int [mem: Local loc: -3 size: 1] [line: 31]
. . . Child: 1 Range [line: 31]
. . . . Child: 0 Const 1 of type int [line: 31]
. . . . Child: 1 Const 10 of type int [line: 31]
. . . Child: 2 Id: a of type int [mem: Local loc: -3 size: 1] [line: 31]
Offset for end of global space: -53
Number of warnings: 13
Number of errors: 0
Please study the input file and its associated output and make your compiler mimic this output as
closely as possible.
3. Deliverables and Submission
Your homework will be submitted as an uncompressed .tar file that contains no subdirectories. Your .tar
file must contain all files and code necessary to build and test your compiler. If you use ourgetopt, be
sure to include the ourgetopt files in your .tar archive.

The .tar file is submitted to the class submission page. You can submit as many times as you like. The
last file you submit before the deadline will be the only one graded. No late submissions are
accepted for this assignment. For all submissions you will receive email at your uidaho address
showing how your file performed on the pre-grade tests. The grading program will use more extensive
tests, so thoroughly test your program with inputs of your own. The error messages that your c- emits
will be sorted before comparison with the expected output so that the order in which your messages are
printed is not as important as it otherwise would be.
Your program must run with no runtime errors such as segmentation violations (SIGSEGVs).

CS 445 – Homework 7

1. The problem
In this assignment we will be generating three-address TM code corresponding to any legal C- input
program. The three-address TM code generated by your compiler must maintain semantic closure with
the C- input program provided by the user.
1.1 Required Changes
Your compiler must generate three-address TM code for any legal C- program. The compile statement
will generally be of the form:
c- testfile.cwith no options. In this case, your compiler should produce a file named testfile.tm containing the
three-address TM code corresponding to the program contained in testfile.c-. The three-address
TM code produced by your compiler must be able to be interpreted by the TM virtual machine (version
4.6a). The TM is described in a document located on the course website, and the source code for the
TM that will be used to interpret your generated code is also located on the course website. The
document that describes the TM also contains three-address code idioms and examples of specific use
cases.
1.2 Expectations
Now that you have reached the code generation phase, assessment of your compiler will be conducted
differently than in previous assignments. The three-address TM code generated by your compiler will
be interpreted by the TM virtual machine and the results of this interpretation will be compared with
expected results. The data archive provided with this assignment contains many examples that you can
use to test your compiler. In this archive is a script named runExamples that is very similar to the script
that will be run on the test machine. This script contains the logic used to determine whether the threeaddress TM code generated by your compiler behaves correctly for a given input file. If the behavior
of your generated code diverges from that of the expected result, any differences will be printed and
displayed side-by-side as in previous assignments.
You have many examples in the data archive for this assignment. You have the source code for the TM
that will be used to interpret your generated code. You have the runExamples script that will be used to
evaluate the behavior of the code that your compiler generates. Since the TM code generated by your
compiler will not be compared with the TM code generated by a working compiler, you are free to
generate code that contains instrumentation and facilitates debugging while still maintaining semantic
closure with the input program. You should use comments in generated code to increase your
understanding of the source locus for a particular instruction or set of instructions that your compiler
generates. You might also consider including comments that dump state information related to your
compiler associated with a particular block of generated code. The debugging features that are
incorporated into the TM virtual machine will be extremely helpful if used wisely. You are free to use
the emitCode.cpp and emitCode.h contained on the course website to help in generating TM code.
2. Deliverables and Submission
Your homework will be submitted as an uncompressed .tar file that contains no subdirectories. Your .tar
file must contain all files and code necessary to build and test your compiler. If you use ourgetopt, be
sure to include the ourgetopt files in your .tar archive.

The .tar file is submitted to the class submission page. You can submit as many times as you like. The
last file you submit before the deadline will be the only one graded. No late submissions are
accepted for this assignment. For all submissions you will receive email at your uidaho address
showing how your file performed on the pre-grade tests. The grading program will use more extensive
tests, so thoroughly test your program with inputs of your own.
Your program must run with no runtime errors such as segmentation violations (SIGSEGVs). Any
attempts to obfuscate the results of an unsuccessful compilation will be met with the harshest possible
remedy.