CS 445 – Homework 2 solution




5/5 - (4 votes)

1. The problem In the last assignment you built a scanner for C- in flex and a “driver” in bison that printed out information about the tokens produced by the scanner. For this assignment you are to modify the “driver” of the last assignment in order to produce a parser for C-. Your parser will construct the abstract syntax tree (AST) corresponding to the input C- program. Building your parser should consist of the following steps: 1. Build a C- recognizer 2. Extend your C- recognizer to add the AST 3. Add user interface options Your completed parser must produce the same AST as is specified in the provided examples. If your parser does not produce the same AST then your parser is flawed. 1.1 Building the Recognizer A recognizer for a particular programming language (PL) simply determines whether or not a given program is a member of the PL that is generated by the grammar for the PL. Use the C- grammar provided on the course website to build a grammar that is suitable for consumption by bison. You will need to fix the dangling-else ambiguity and any other ambiguities or associativity/precedence errors that are currently in the C- grammar. You may not use any bison directives such as %assoc, %left, %right, or %expect in fixing these errors. After you have learned to fix errors “by hand” you can venture into the realm of trying to figure out what the tools are doing to you when they are supposed to be fixing errors for you. But, for now, you are to fix any errors by hand. Once you have built a bison-ready C- grammar that contains no shift/reduce or reduce/reduce conflicts, you have successfully built a recognizer that needs to be tested. Test your recognizer by passing legal C- files into it and observing your recognizer. If your recognizer doesn’t emit a message indicating that there was an error (e.g. an unrecognized character or missing paren,) your recognizer recognized that the C- program is legal. If your recognizer emits an error message or behaves in an abnormal way (e.g. SIGSEGV) your grammar is incorrect and needs to be repaired. Once you have successfully tested your recognizer on all provided C- samples and are confident that it will recognize any legal C- program, it is time to extend the recognizer to build a parser. 1.2.1 Adding the AST Typically compilers build a modified form of syntax tree known as an abstract syntax tree (AST) instead of building a full parse tree for a given program. An AST is like a distilled form of parse tree that only contains the information that we need to continue the compilation process. Your parser is to construct an AST for any legal C- program. An AST is of course just a tree, and the tree will contain nodes representing information that was gathered about the input C- program. The following is a sample AST node structure that has been previously used in this course: typedef struct treeNode { // connectivity in the tree struct treeNode *child[MAXCHILDREN]; // children of the node struct treeNode *sibling; // siblings for the node // what kind of node int lineno; // linenum relevant to this node NodeKind nodekind; // type of this node union // subtype of type { DeclKind decl; // used when DeclK StmtKind stmt; // used when StmtK ExpKind exp; // used when ExpK } subkind; // extra properties about the node depending on type of the node union // relevant data to type -> attr { OpKind op; // type of token (same as in bison) int value; // used when an integer constant or boolean unsigned char cvalue // used when a character char *string; // used when a string constant char *name; // used when IdK } attr; ExpType expType; // used when ExpK for type checking bool isArray; // is this an array bool isStatic; // is staticly allocated? // even more semantic stuff will go here in later assignments. } TreeNode; This structure was pulled from the Louden textbook and is just a suggestion. You can use whatever you want, as long as you can reliably represent the necessary information in memory and can produce the same results as in the provided examples. The following type definitions are used by the TreeNode structure above, and were also pulled from the Louden textbook: // Kinds of Operators // these are the token numbers for the operators same as in flex typedef int OpKind; // Kinds of Statements //typedef enum {DeclK, StmtK, ExpK} NodeKind; enum NodeKind {DeclK, StmtK, ExpK}; // Subkinds of Declarations enum DeclKind {VarK, FuncK, ParamK}; // Subkinds of Statements enum StmtKind {NullK, IfK, WhileK, ForK, CompoundK, ReturnK, BreakK, RangeK}; // Subkinds of Expressions enum ExpKind {OpK, ConstantK, IdK, AssignK, InitK, CallK}; // ExpType is used for type checking (Void means no type or value, UndefinedType means undefined) enum ExpType {Void, Integer, Boolean, Char, CharInt, Equal, UndefinedType}; // What kind of scoping is used? (decided during typing) enum VarKind {None, Local, Global, Parameter, LocalStatic}; Because this design was borrowed from the Louden textbook, you are free to use the example from the Tiny language compiler in the book (a link for the code for this is on the course website) as a reference to study and build from. You are free to use a C++ class rather than a struct to represent a TreeNode. You can use any structure that you like as long as your C- parser produces the same output as in the provided examples. Building a correct AST is of critical importance because all future assignments will require successful construction and traversal of the AST corresponding to a given C- program. Any bugs that you produce in the construction and traversal of your AST are going to follow you for the remainder of the semester, so be Very Careful. To encode a C- program as an AST, you will need to construct the right nodes at the right times during the parsing process. When you need to construct a node, you should use functions that you build like the newStmtNode function in the util.c file for the Tiny language compiler in the book (a link for the code for this is on the course website). These nodes will be passed up the tree as pointers and assembled to build the AST like in the example for the Tiny language in the Louden textbook. The following is an example to use for specifying the set of types that can represent a yylval in your bison file (this was discussed in class). %union { ExpType type; // For passing types (i.e pass a type in a decl like int or bool) TokenData *tokenData; // For terminals. Token data comes from yylex() in the $ vars TreeNode * tree; // For nonterminals. Add these nodes as you build the tree. } This is just a suggestion of what could be in your %union. Note, however, that you are expressly forbidden from accessing YYSTYPE as is done in the Louden book. Using YYSTYPE directly subverts type checking facilities and is too risky for you in this course. You should, as mentioned earlier, build functions to construct different types of TreeNode that you want to put in your AST. The following are prototypes that can be used for these functions. If you examine the Tiny compiler code you will see that this is how TreeNode children are added to a given TreeNode in almost all cases (the initialization of a var is one exception). The use of default values for arguments in these prototypes simplifies the construction of a given TreeNode and also simplifies the corresponding code for the action in your bison file. TreeNode *newDeclNode(DeclKind kind, ExpType type, TokenData* token=NULL, TreeNode* c0=NULL, TreeNode* c1=NULL, TreeNode* c2=NULL); TreeNode *newStmtNode(StmtKind kind, TokenData* token, TreeNode* c0=NULL, TreeNode* c1=NULL, TreeNode* c2=NULL); TreeNode *newExpNode(ExpKind kind, TokenData* token, TreeNode* c0=NULL, TreeNode* c1=NULL, TreeNode* c2=NULL); Note: Once you have your parser working correctly, make sure that you modify your parser to rename the unary – (minus) operator to the CHSIGN token and rename the unary * (asterisk) operator to the SIZEOF token. There is no need to leave these operators untokenized after we can differentiate between their use in a unary or binary context. 1.2.2 Printing the AST Once you have successfully constructed an AST for a given program, you will need to print the AST in order to verify that your AST is correct. How do you know that your AST is correct? The structure of a correct AST for a given program is dictated by your bison grammar. You can also compare your printed AST with examples that are provided in the test dataset for this assignment. To print your AST, you should make a function in your code called printTree. Your printTree function must print out the AST precisely as it is printed in the provided examples. Almost the same format is not good enough. The printTree function will be called after the AST has been constructed by your bison-generated parser. Your main() function in your bison file will look something like this: int main(int argc, char* argv[]) { // some stuff here… // Call yyparse to build the AST and put its address in // the “global” var named syntaxTree of type TreeNode* yyparse(); if (printTreeFlag) // Print the contents of the AST to stdout printTree(stdout, syntaxTree); // other stuff here… } The printTreeFlag variable in the example above is one of the user interface options that you are going to add for this assignment, and is discussed later in this document. The printTree function prints the important information within a given node, and then recursively applies the printTree function to all of the non-NULL children of the node (starting with child[0]), and then recursively applies the printTree function to the sibling pointer if it is not NULL. The first sibling found is numbered 1. Reading the AST provided for given examples is a good way to figure out what to do. For example, given the following C- program as input: ## C-S22 int main() begin int x; int y[5]; bool b; b <- true or false and 11 + 22 * 33 / 44 > 55; if 666>777 then begin x <- 111; end else y[3] <- x+222; while x>999 do begin x <- 333; if x<9 then break; x <- 444; break; end for z <- 1 to 8 do if z>x then x <- z; end Your c- (the program containing the C- scanner, parser, and print routines) will scan the input, produce an AST, and print the following output: Func: main returns type int [line: 2] . Child: 1 Compound [line: 3] . . Child: 0 Var: x of type int [line: 4] . . Sibling: 1 Var: y of array of type int [line: 5] . . Sibling: 2 Var: b of type bool [line: 6] . . Child: 1 Assign: <- [line: 8] . . . Child: 0 Id: b [line: 8] . . . Child: 1 Op: or [line: 8] . . . . Child: 0 Const true [line: 8] . . . . Child: 1 Op: and [line: 8] . . . . . Child: 0 Const false [line: 8] . . . . . Child: 1 Op: > [line: 8] . . . . . . Child: 0 Op: + [line: 8] . . . . . . . Child: 0 Const 11 [line: 8] . . . . . . . Child: 1 Op: / [line: 8] . . . . . . . . Child: 0 Op: * [line: 8] . . . . . . . . . Child: 0 Const 22 [line: 8] . . . . . . . . . Child: 1 Const 33 [line: 8] . . . . . . . . Child: 1 Const 44 [line: 8] . . . . . . Child: 1 Const 55 [line: 8] . . Sibling: 1 If [line: 10] . . . Child: 0 Op: > [line: 10] . . . . Child: 0 Const 666 [line: 10] . . . . Child: 1 Const 777 [line: 10] . . . Child: 1 Compound [line: 10] . . . . Child: 1 Assign: <- [line: 11] . . . . . Child: 0 Id: x [line: 11] . . . . . Child: 1 Const 111 [line: 11] . . . Child: 2 Assign: <- [line: 13] . . . . Child: 0 Op: [ [line: 13] . . . . . Child: 0 Id: y [line: 13] . . . . . Child: 1 Const 3 [line: 13] . . . . Child: 1 Op: + [line: 13] . . . . . Child: 0 Id: x [line: 13] . . . . . Child: 1 Const 222 [line: 13] . . Sibling: 2 While [line: 15] . . . Child: 0 Op: > [line: 15] . . . . Child: 0 Id: x [line: 15] . . . . Child: 1 Const 999 [line: 15] . . . Child: 1 Compound [line: 15] . . . . Child: 1 Assign: <- [line: 16] . . . . . Child: 0 Id: x [line: 16] . . . . . Child: 1 Const 333 [line: 16] . . . . Sibling: 1 If [line: 17] . . . . . Child: 0 Op: < [line: 17] . . . . . . Child: 0 Id: x [line: 17] . . . . . . Child: 1 Const 9 [line: 17] . . . . . Child: 1 Break [line: 17] . . . . Sibling: 2 Assign: <- [line: 18] . . . . . Child: 0 Id: x [line: 18] . . . . . Child: 1 Const 444 [line: 18] . . . . Sibling: 3 Break [line: 19] . . Sibling: 3 For [line: 22] . . . Child: 0 Var: z of type int [line: 22] . . . Child: 1 Range [line: 22] . . . . Child: 0 Const 1 [line: 22] . . . . Child: 1 Const 8 [line: 22] . . . Child: 2 If [line: 22] . . . . Child: 0 Op: > [line: 22] . . . . . Child: 0 Id: z [line: 22] . . . . . Child: 1 Id: x [line: 22] . . . . Child: 1 Assign: <- [line: 22] . . . . . Child: 0 Id: x [line: 22] . . . . . Child: 1 Id: z [line: 22] Note that this output, including the dots to indicate indenting levels, is what your program output is going to be compared against. Your c- must produce this exact same output for this particular input. When building your AST, if there is an optional expression or statement that is absent, be sure to set the corresponding child pointer to NULL. For example, compound statements can contain declarations, but sometimes they don’t. If a compound statement doesn’t contain declarations, then the child[0] pointer for that compound statement should be set to NULL. As another example, a return statement in C- may or may not contain an expression. If a given return statement doesn’t contain an expression, the child[0] pointer corresponding to the TreeNode for the return statement should be set to NULL. The default value for all pointers, including unnecessary children and siblings, should always be NULL. This means that your code must set these pointers to NULL and consistently check the value of a pointer before attempting to dereference the pointer. The bison code in the Tiny compiler example source code (on the course website) is a good example of how to connect the TreeNode nodes that you construct. The calculator example that we showed in class is a good example of how to organize the interaction between the scanner (flex) and the parser (bison). The following are two examples of functions that can be used to connect sibling pointers and to pass information down a list of siblings. These functions also demonstrate the use of some of the type variables in struct TreeNode. // Adds a TreeNode to a list of siblings. // Adding a NULL node to the sibling list is probably a programming error! TreeNode *addSibling(TreeNode *t, TreeNode *s) { if (s==NULL) { printf(“ERROR(SYSTEM): never add a NULL to a sibling list.\n”); exit(1); } if (t!=NULL) { TreeNode *tmp; tmp = t; while (tmp->sibling!=NULL) tmp = tmp->sibling; tmp->sibling = s; return t; } return s; } // Passes the isStatic and type attributes down the sibling list. void setType(TreeNode *t, ExpType type, bool isStatic) { while (t) { t->expType = type; t->isStatic = isStatic; t = t->sibling; } } 1.3 User Interface Options Once you have successfully built your parser, you are ready to make changes to the interface of your cprogram, which is now starting to look like a compiler. Your program must be named c- (note the lowercase) as was the case in the last assignment. Your c- must be able to read the stream of tokens contained within a file whose name is passed as an argument on the command line, or to read the stream of tokens from standard input (stdin) if no argument is passed on the command line. This is the same behavior that was present in the last assignment. 1.3.1 The -p Option Your c- will always construct an AST for a legal C- program. If the user specifies the -p option on the command line, your c- will print the AST that it built to stdout. For example, if the user types the following command: $bash> c- -p mergeSort.cYour c- will build the AST for the program in mergeSort.c- and will print the AST to stdout. Whereas if the user types the following command: $bash> c- mergeSort.c- Your c- will build the AST for the program in mergeSort.c- but will not print the AST to stdout. The -p option will set a variable named printTreeFlag (as shown earlier in this assignment) so that you can determine whether or not to print the AST after it has been built. The -p option is the first of many options that we will be adding to the compiler interface during the semester. There is a Unix function called getopt that is used to handle command line arguments that you are free to use. A custom version called ourgetopt is provided on the course website. This custom version is much more portable than standard getopt implementations, and is provided so that you can reap the benefits of getopt while building your compiler on platforms other than Unix. 1.3.2 The -d Option Your c- will also take the -d option on the command line. This option enables debugging of the parser by setting the yydebug flag in bison to 1. For example if the user enters the following command: $bash> c- -d mergeSort.cYour -c will build an AST for the program in mergeSort.c and information about what is happening during the parsing of the file will be printed to stdout. The information that is printed will automatically be printed by the parser generated by bison once you set yydebug to 1. If the user doesn’t specify the -d option, no information about what is happening during parsing will be printed. In order to access the yydebug flag that is inside the parser generated by bison, be sure to declare it as an external integer (extern int yyedebug) inside your code. 2. Deliverables and Submission Your homework will be submitted as an uncompressed .tar file that contains no subdirectories. Your .tar file must contain all files and code necessary to build and test your compiler. If you use ourgetopt, be sure to include the ourgetopt files in your .tar archive. The .tar file is submitted to the class submission page. You can submit as many times as you like. The last file you submit before the deadline will be the only one graded. No late submissions are accepted for this assignment. For all submissions you will receive email at your uidaho address showing how your file performed on the pre-grade tests. The grading program will use more extensive tests, so thoroughly test your program with inputs of your own. Your code must compile successfully and have no shift/reduce or reduce/reduce conflicts from bison. Your program must run with no runtime errors such as segmentation violations (SIGSEGVs).