Description
EECS 281 Programming Project 1 Letterman Reboot (Path Finding)
Overview The evil Spell Binder is loose, and it’s up to Letterman to save us! Letterman hasn’t been very active lately, and his power of changing one word into another by changing only one letter needs upgrading. Yes, in the old days he could change “tickle” into “pickle”, but can this new Letterman 2.0 change “evil” into “good”? Only you can say! Your program will be given a dictionary of words, a “start” word and an “ending” word, and which types of conversions you are allowed to perform (such as changing one letter to another, adding or deleting a letter, etc.). Your goal is to convert the start word to another word, to another word, etc., eventually leading to the ending word, making one change at a time. You must use the given rules of conversion, and only use valid words from the dictionary along the way. For example, you could change “chip” into “chop”, but you couldn’t change “chip” into “chiz” because the word “chiz” doesn’t exist in the dictionary. Program Input You must help Letterman navigate through the Spell Binder’s word traps. You will be given a starting and ending word, and a dictionary to search. The starting and ending words, and any other necessary flags, will be given on the command line when the program is run. Input file format (The Dictionary) The program gets its dictionary from standard input (cin). The dictionary can be in a file, and you redirect that file to cin when you run the program (details later). There are two different types of dictionaries that the program needs to be compatible with: complex (C) and simple (S). For both dictionaries, the first line will be a single character specifying the dictionary type ‘C’ or ‘S’. Unlike the output mode, which is given on the command line (see below), this is part of the file. The second line will be a single positive integer N indicating the number of lines in the dictionary not counting the first line and lines that are comments (i.e. for simple dictionaries, the number of words, and for complex dictionaries, the number of word-generating lines). Version 09-07-20 Current version by: David Paoletti Previous versions: David Paoletti, David Snider, Laura (Wendlandt) Burdick, Luum Habtemariam, Jiaxi Wu © 2020 Regents of the University of Michigan We do not place a limit on the magnitude of N and neither should your code. Comments may also be included in any input file. Comment lines begin with “//” (without quotes) in column 1, and are allowed anywhere in the file after the second line. When developing your test files, it is good practice to place a comment on line 3 describing the nature of the dictionary in the test file. Any dictionaries with noteworthy characteristics for testing purposes should also be commented. You should discard all existing comments from the input file; do not save them in memory as part of your data structures. Additionally, there may be extra blank/empty lines at the end of any input file: your program should ignore them. If you see a blank line in the file, you may assume that you have hit the end. Simple Dictionary The first type of dictionary that your program needs to handle is the simple dictionary. This is a simple text file specifying the words in the dictionary, one word per line. Each “word” will be a sequence of alphabetic characters. Each word in the dictionary is unique; there will never be two copies of the same word. Here is a valid input file: S 10 // Just a short example dictionary. Although these words // are in alphabetical order, that is not required. chip chop junk leet let shin ship shop shot stop Complex Dictionary The second type of dictionary that your program needs to handle is a complex dictionary. Like the simple dictionary, there will be one string per line. However, in this dictionary, each line could be a simple alphabetic string, like the simple dictionary, or it could contain special characters. If a line contains special characters, then it will be used to generate alphabetic words that are a part of the dictionary. Each line will contain at most one special character 2 (except in the case of insert-each, where a pair of square brackets counts as one special character). Here are the special characters that may be included: – Reversal (&): If an ampersand appears at the end of the word, then both the word and the reversal of the word are generated, in that order. An ampersand will not appear in the middle of a word. – Example: “desserts&” – “desserts” and “stressed” are generated, in that order – Insert-each ([]): If a set of characters appears inside square brackets, each character is inserted into the word, generating N words in the order of the letters, where N is the number of characters within the square brackets. There will not be square brackets without letters within them and there will not be duplicate letters. – Example: “tre[an]d” – “tread” and “trend” are generated, in that order – Example: “c[auo]t” – “cat”, “cut”, and “cot” are generated, in that order – Swap (!): If an exclamation point appears after two characters, then the original string and the string with the two previous characters swapped are generated, in that order. An exclamation point will only occur if at least two characters precede it. – Example: “bar!d” – “bard” and “brad” are generated – Double (?): If a question mark appears after one character, then the original string and the string with the one previous character doubled are generated, in that order. A question mark will only occur if at least one character precedes it. – Example: “le?t” – “let” and “leet” are generated Here is an example complex dictionary, with words similar to the previous simple dictionary: C 7 //This generates the dictionary: //chip, chop, junk, star, tsar, ship, shop, //shot, stop, pots, let, leet ch[io]p junk st!ar sh[io]p shot stop& le?t Morph Modes There are a few ways that Letterman can convert words to other words. – Change: Letterman can change a single letter of a word – Example: he can turn “pun” into “fun” – Swap: Letterman can swap any single pair of adjacent letters 3 – Example: he can turn “brad” into “bard” – Insert: Letterman can add a letter – Example: he can turn “stun” into “stunt” – Delete: Letterman can remove a letter – Example: he can turn “boar” into “bar” The modifications that Letterman is allowed to make will be determined by arguments on the command line. Command line arguments Your program should take the following case-sensitive command line options (when we say a switch is “set”, it means that it appears on the command line when you call the program): ● –stack, -s: If this switch is set, use the stack-based routing scheme. ● –queue, -q: If this switch is set, use the queue-based routing scheme. ● –change, -c: If this switch is set, Letterman is allowed to change one letter into another. ● –swap, -p: If this switch is set, Letterman is allowed to swap any two adjacent characters. ● –length, -l: If this switch is set, Letterman is allowed to modify the length of a word, by inserting or deleting a single letter. ● –output (W|M), -o (W|M): Indicates the output file format by following the flag with a W (word format) or M (modification format). If the –output option is not specified, default to word output format (W). If –output is specified on the command line, the argument (either W or M) to it is required. See the examples below regarding use. ● –begin , -b : This specifies the word that Letterman starts with. This flag must be specified on the command line, and when it is specified a word must follow it. ● –end , -e : This specifies the word that Letterman must reach. This flag must be specified on the command line, and when it is specified a word must follow it. ● –help, -h: If this switch is set, the program should print a brief help message which describes what the program does and what each of the flags are. The program should then exit(0) or return 0 from main(). When we say –stack, or -s, we mean that calling the program with –stack does the same thing as calling the program with -s. See getopt_long() for how to do this. Legal command line arguments must include exactly one of –stack or –queue (or their respective short forms -s or -q). If none are specified or more than one is specified, the program should print an informative message to standard error (cerr) and call exit(1). A legal command line must specify at least one of –change, –length, and –swap (or their respective short forms). Examples of legal command lines: ● ./letter –stack -b ship -e shot –length < infile 4 ○ This will run the program using a the stack algorithm and word output mode. The only modifications allowed on words are inserting/deleting letters, NOT changing one letter into another. The file “infile” on disk is redirected to cin. ● ./letter -b ship -e shot -c –queue –output W < infile | more ○ This will run the program using the queue algorithm, word output mode, and letters can only be changed into other letters. ○ Output is sent to the more program; press the spacebar after each page. ● ./letter –stack –output M -b ship -e shot –length –change < infile > outfile ○ This will run the program using the stack algorithm, modification output mode, and letters can be changed, inserted, or deleted. ○ Output is saved on disk in “outfile”, destroying that file if it already exists. Examples of illegal command lines: ● ./letter –queue -s -b ship -e shot -c < infile > outfile ○ Contradictory choice of routing ● ./letter -b ship -e shot -c < infile | more ○ You must specify either stack or queue ● ./letter -s -b ship -e shot < infile > outfile ○ You must specify at least one of change, length, and swap. Routing schemes You are to develop two routing schemes to help Letterman get from the starting word to the ending word: ● A queue-based routing scheme ● A stack-based routing scheme In the routing scheme use a data structure (queue or stack, or better yet a deque) of words to check, which we will refer to as the “search container”. First, initialize the algorithm by adding the starting word into the search container. Mark this word as already discovered. Then loop through the following steps: 1. Remove the next word from the search container: this becomes the “current” word. 2. Investigate the current word: add all words to the search container that are sufficiently similar to (as defined by the command line) the current word that are available (not already discovered). Add any such words in the following order: beginning of dictionary to the end. Do not add words that have already been discovered. Mark each word added to the search container as discovered. 3. As you add these words to the search container, check to see if any of them is the ending word; if so, stop; else go back to step 1. If the search container becomes empty before you reach the ending word, the search has failed and there is no series of words to foil the Spell Binder’s evil plan. 5 We use different terminology carefully here, and will try to do so in office hours. “Discovered” is when a word is added to the search container, “investigated” is when that word leaves the search container to become the current word, and you check for words that are similar to it. Since every word can be discovered at most once, it can be investigated at most once. Some words might be discovered but never investigated (they were still in the search container when the ending word was discovered). Output file format The program will write its output to standard output (cout), and there are two possible output formats. The output format will be specified through a command line option ‘–output’, or ‘-o’, which will be followed by an argument of W or M (W for word output and M for modification output). See the section on command line arguments below for more details. For both output formats, you will show the path of words you took from start to finish. In both cases you should first print the number of words in the morph, and include the start word (see examples below). Word output mode (W): For this output mode, you should print each word. Beginning at the starting word, print the words in the morph until you reach the ending word. For both the simple and complex dictionary sample inputs specified earlier, using the queue-based routing scheme and word (W) style output and trying to change “chip” into “stop”, you should produce the following output: Words in morph: 4 chip chop shop stop Using the same input file but with the stack-based routing scheme, you should produce the following output: Words in morph: 4 chip ship shop stop 6 Modification output mode (M): For this output mode, instead of printing each word, each line of output should be how to change from one word to the next. The start word should always be displayed. The modification lines which follow the start word have one of the following forms: ● c,, ● i,, ● d, ● s, These four forms correspond to a letter changing (c), a letter being inserted (i),a letter being deleted (d), or two letters being swapped (s). The always indicates the index of the modification (for swaps, the index of the first letter swapped), and the is the new letter (either changed to or inserted). If there is one change, but two possible places for the index, always specify where the first difference occurs. For example, if “put” changed into “putt”, the characters at positions 0, 1, and 2 are the same, the new letter occurs at index 3. Similarly for changing “putt” into “put”, the deletion occurs at index 3 (because indices 0 through 2 are the same characters). The following are examples of correct output format in (M) modification mode that reflect the same solution as the word output format above. For the queue solution: Words in morph: 4 chip c,2,o c,0,s c,1,t In the above output, it is saying to start with the word “chip”, change the letter at index 2 into o (producing chop), then change the letter at index 0 into s (producing shop), then change the letter at index 1 into t (producing stop). For the stack solution: Words in morph: 4 chip c,0,s c,2,o c,1,t 7 There is only one acceptable solution per routing scheme for each dictionary and start/end word pair. If no valid morphing exists (such as trying to change “ship” into “junk”), the program should simply display “No solution, X words discovered.” (without quotes) on a line by itself, instead of the “Words in morph” line. The X represents the number of words in the dictionary that were part of the search. A word has been “discovered” if it was ever added to the search container. For example, if you were trying to change “ship” into “junk”, simple dictionary with change mode on, the output would read “No solution, 7 words discovered.”. This output will be the same for either word or morph output mode, and for either stack or queue mode. Errors you must check for A small portion of your grade will be based on error checking. You must check for the following errors: ● More or less than one –stack/-s or –queue/-q on the command line. ● No –change/-c, –length/-l, or –swap/-p on the command line. ● The –output/-o flag is followed by an invalid character. ● Either the start or end word is not specified, or does not exist in the dictionary. ● The –change/-c and/or –swap/-p flags are specified, but –length/-l is not, and the start/end words do not match in length (creating an impossible situation). In all of these cases, print an informative error message to standard error (cerr) and call exit(1). The autograder will not look at the actual text of the error message itself, but these error messages may help you while you’re debugging your program. Anyone on staff can look at your submission and tell you what error you displayed before the exit(1). So if your program does an exit(1) but the autograder informs you it was a valid test case, come to office hours. You do not need to check for any other errors. Assumptions you may make ● The first line of the dictionary will contain either a capital letter C or S and that letter will correctly reflect the dictionary type. ● The second line of the dictionary will contain a number, and the number of words in the file will match that number (the file will contain that many words/word generating lines). ● Comments will begin with // as the first two characters on a line, and can have any number of words on that line. No comment will appear before the number of words on line 2. ● The number of words on line 2 does NOT include comment lines. ● A dictionary will not contain any duplicate words. ● Other than comment lines, the dictionary will have one word per line (thus words will never contain blank spaces). 8 ● For complex dictionaries, special characters will not appear incorrectly (for example, the reversal symbol will not appear in the middle of a word). ● You do not have to consider upper- and lower-case versions of words to be the same (for example “a” and “A” are different words). Test Files It is extremely frustrating to turn in code that you are “certain” is functional and then receive half credit. We will be grading for correctness primarily by running your program on a number of test cases. If you have a single bug that causes many of the test cases to fail, you might get a very low score on the project even though you completed 95% of the work. Most of your grade will come from correctness testing. Therefore, it is imperative that you test your code thoroughly. To help you do this we will require that you write and submit a suite of test files that thoroughly test this project. Your test files will be used to test a set of buggy solutions to the project. Part of your grade will be based on how many of the bugs are exposed by your test files. We say a bug is exposed by a test file if the test file causes our buggy solution to produce different output from our correct solution. Each test file should be a dictionary input file. Each test file file should be named test-n-start-end-flags.txt, where 0 < n <= 15 for each test file. The start and end indicate the start/end words, and the flags portion should include a combination of letters which correspond to command line arguments. Valid letters in the flags portion of the filename are: ● s: Run stack mode ● q: Run queue mode ● c: Run in change mode ● l: Run in length mode ● p: Run in swap mode ● w: Produce word output ● m: Produce modification output The flags that you specify as part of your test filename should allow us to produce a valid command line. For instance, don’t include both s and q, but include one of them; include at least one of c, l and p; include at most one of w or m, but if you leave it off, we’ll run in word output mode. For example, a valid test file might be named test-1-ship-shot-scw.txt (change from ship to shot, stack mode, change mode, word output). Given this test file name, we would run your program with a command line similar to the following (we might use long or short options, such as –change instead of -c): ./letter –begin ship -e shot –stack -c -o W < test-1-ship-shot-scw.txt > test-1-out.txt Each dictionary may have no more than 20 words. You may submit up to 15 test files (though it 9 is possible to get full credit with fewer test files). The dictionaries the autograder runs with your solution are NOT limited to 20 words; your solution should not impose any size limits (as long as sufficient system memory is available). Your complex dictionary might start with 20 words but produce more; that is still a valid dictionary. Input and Output Redirection We are using input and output redirection in some of the above examples. While we are reading our input from a file and sending our output to another file in this case, we are NOT using file streams! The < redirects the file specified by the next command line argument to be the standard input (stdin/cin) for the program. This is much easier than retyping the dictionary every time you run the program! The > redirects the output (to stdout/cout) of the program to be printed to the file specified by the next command line argument. The | pipes the output of your program to the input of the command that follows, such as more (which displays with page breaks). The operating system makes calls to cin to read the input file and it makes calls to cout to write to the output file. Come to office hours if this is confusing! Runtime The program must run to completion within 35 seconds of total CPU time (user + system). This is more time than you should need. See the time manpage for more information (this can be done in Unix by entering “man time” to the command line). We may test your program on very large dictionaries (up to several hundred thousand words). Be sure you are able to navigate to the end word in large dictionaries within 35 seconds. Smaller dictionaries should run MUCH faster. Libraries and Restrictions Unless otherwise stated, you are allowed and encouraged to use all parts of the C++ STL and the other standard header files for this project. You are not allowed to use other libraries (eg: boost, pthread, etc). You are not allowed to use the C++ smart pointers (shared or unique), or the C++11 regular expressions library (it is not fully implemented in the version of gcc used by the autograder) or the thread/atomics libraries (it spoils runtime measurements). Submission to the Autograder Do all of your work (with all needed files, as well as test files) in some directory other than your home directory. This will be your “submit directory”. Before you turn in your code, be sure that: ● Every source code and header file contains the following project identifier in a comment at the top of the file: // Project Identifier: 50EB44D3F029ED934858FFFCEAC3547C68768FC9 ● The Makefile must also have this identifier (in the first TODO block). 10 ● DO NOT copy the above identifier from the PDF! It might contain hidden characters. Copy it from the README.txt file instead (this file is included in the samples archive).. ● You have deleted all .o files and your executable(s). Typing ‘make clean’ shall accomplish this. If make clean does not remove all of these files, you will lose points. ● Your makefile is called Makefile. Typing ‘make -R -r’ builds your code without errors and generates an executable file called letter. The command line options -R and -r disable automatic build rules (which will not work on the autograder). ● Your Makefile specifies that you are compiling with the gcc optimization option -O3. This is extremely important for getting all of the performance points, as -O3 can often speed up code by an order of magnitude. You should also ensure that you are not submitting a Makefile to the autograder that compiles with the debug flag, -g, as this will slow your code down considerably. If your code “works” when you don’t compile with -O3 and breaks when you do, it means you have a bug in your code! ● Your test files are named test-n-start-end-flags.txt and no other project file names begin with test. Up to 15 tests may be submitted. ● The total size of your program and test files does not exceed 2MB. ● You don’t have any unnecessary files or other junk in your submit directory and your submit directory has no subdirectories. ● Your code compiles and runs correctly using version 6.2.0 of the g++ compiler. This is available on the CAEN Linux systems (that you can access via login.engin.umich.edu). Even if everything seems to work on another operating system or with different versions of GCC, the course staff will not support anything other than GCC 6.2.0 running on CAEN Linux. In order to compile with g++ version 6.2.0 on CAEN you must put the following at the top of your Makefile: PATH := /usr/um/gcc-6.2.0/bin:$(PATH) LD_LIBRARY_PATH := /usr/um/gcc-6.2.0/lib64 LD_RUN_PATH := /usr/um/gcc-6.2.0/lib64 ● To run the generated executable, you need to run module load gcc/6.2.0 once per session. You can add this line to your ~/.bashrc file if you don’t want to run it on every login. ● Note that valgrind may report “still reachable” memory when compiling with GCC 6.2.0. This is not a concern on the autograder; you need only worry about “definitely lost” memory. Turn in all of the following files: ● All your .h and .cc or .cpp files for the project ● Your Makefile ● Your test files You must prepare a compressed tar archive (.tar.gz file) of all of your files to submit to the autograder. One way to do this is to have all of your files for submission (and nothing else) in one directory. Go into this directory and run this command: dos2unix -U *; tar czvf ./submit.tar.gz *.cpp *.h *.cc *.c Makefile test*.txt 11 This will prepare a suitable file in your working directory. If you’re using our Makefile, instead just type make fullsubmit Submit your project files directly to either of the two autograders at: https://g281-1.eecs.umich.edu/ or https://g281-2.eecs.umich.edu/. You should load-balance yourselves: if you see that there are 10 people in the queue on autograder 1 and none for autograder 2, submit your project to autograder 2. Do not submit to both autograders at once! You can safely ignore and override any warnings about an invalid security certificate. When the autograders are turned on and accepting submissions, there will be an announcement. The autograders are identical and your daily submission limit will be shared (and kept track of) between them. You may submit up to three times per calendar day with autograder feedback (double that during Spring semester). For this purpose, days begin and end at midnight (Ann Arbor local time). We will use your best submission for final grading. If you would instead like us to use your LAST submission, see the autograder FAQ page, or use this form. We strongly recommend that you use some form of revision control (ie: SVN, GIT, etc) and that you ‘commit’ your files every time you upload to the autograder so that you can always retrieve an older version of the code as needed. If you use an online revision control system, make sure that your projects and files are PRIVATE; many sites make them public by default! If someone searches and finds your code and uses it, this could trigger Honor Code proceedings for you. Please make sure that you read all messages shown at the top section of your autograder results! These messages will help explain some of the issues you are having (such as losing points for having a bad Makefile). Grading 80 points — Your grade will be primarily based on the correctness of your algorithms. Your program must have correct and working stack and queue algorithms and support both types of output modes. Additionally: Part of your grade will be derived from the runtime performance of your algorithms. Fast running algorithms will receive all possible performance points. Slower running algorithms may receive only a portion of the performance points. The same applies for use of system memory: programs using less memory will receive full points, solutions that use too much will lose points. The autograder machines keep track of the fastest run times (“click on View scoreboard” from the autograder project page). You may track your progress relative to other students and instructors there. 10 points — Not leaking memory. You can ensure that your program does not leak memory by running it with valgrind. 10 points — Test file coverage (effectiveness at exposing buggy solutions). We reserve the right to deduct up to 5 points for poor style (1000 line main, 20 single-letter 12 variables, etc.). Readability is generally defined as follows: ● Clean organization and consistency throughout your overall program ● Proper partitioning of code into header and cpp files ● Descriptive variable names and proper use of C++ idioms ● Omitting globals, unnecessary literals, or unused libraries ● Effective use of comments ● Reasonable formatting – e.g an 80 column display ● Code reuse/no excessive copy-pasted code blocks Hints and advice Check the other PDFs (Project 1 the STL and You, Examples). There’s also a tutorial video! ● It is extremely helpful to compile your code with the gcc options: -Wall -Wextra -pedantic -Wconversion -Werror, and the default Makefile will turn on these flags. This will help you catch bugs in your code early by having the compiler point out when you write code that is either of poor style or might result in behavior that you did not intend. ● Design your data structures and work through algorithms on paper first. Draw pictures. Consider different possibilities before you start coding. If you’re having problems at the design stage, come to office hours. After you have done some design and have a general understanding of the assignment, re-read this document. Consult it often during your solution’s development to ensure that all of your code is in compliance with the specification. ● Always think through your data structures and algorithms before you code them. It is important that you use efficient algorithms in this project and in this course, and coding before thinking often results in inefficient algorithms. ○ If you are considering linked lists, don’t. We’ll see later (Lecture 7) that they are often the wrong choice, and they are in this project. ● Read input (the dictionary) from cin; do not create a file stream and open a file by name. ● Display the specified output to standard output (cout). You may print whatever diagnostic information you wish to standard error (cerr). However, make sure it does not scale with the size of input, or your program may not complete within the time limit for large test cases. For example, if the command line is invald, send a descriptive, identifiable reason to cerr before calling exit(1). ● Don’t try to write the entire program at once. ○ Start with a single input mode (start with simple dictionary). ○ Start with a single morphing mode (we suggest change mode). ○ Start with a single output mode (word mode, rather than morph mode). ● Maybe start by finding if a solution is possible or not. ○ Print correct output if no solution is possible. ○ Then work on correct output if a solution is possible. 13 ● Add other morphing modes, dictionary type, output mode after the basics are working. ● This is not an easy project. Start reading, thinking and planning immediately! Have fun coding! Appendix A — Larger Example This example uses a slightly larger dictionary, and allows letter changes and insertions/deletions. The dictionary is on this page, followed by queue output and stack output. You can easily create a complex dictionary containing the same words, and we leave that to you. Note that a complex dictionary might give slightly different output, since the words in the in-memory dictionary might appear in a different order (i.e. so[au]p and so[iu]l produce soap soup soil soul). Dictionary: S 20 // Used for Appendix A example rain ruin run sail she shy ski skip sky slap slip soap soil soul soup sue sun tail trail train Let’s use the simple dictionary given above to change sky into sun (I wanted to change snow into sun, but snow wasn’t in the dictionary). 14 Word Output (Queue): Words in morph: 5 sky shy she sue sun Modification Output (Queue): Words in morph: 5 sky c,1,h c,2,e c,1,u c,2,n Changing sky into sun again, but with stack mode. Word Output (Stack): Words in morph: 17 sky ski skip slip slap soap soup soul soil sail tail trail train rain ruin run sun Modification Output (Stack): Words in morph: 17 15 sky c,2,i i,3,p c,1,l c,2,a c,1,o c,2,u c,3,l c,2,i c,1,a c,0,t i,1,r c,4,n d,0 c,1,u d,2 c,0,s Appendix B — Autograder (AG) Information You’ll notice that the test cases on the autograder have a somewhat intricate naming pattern. In general that pattern follows the following scheme: ● First letter indicates whether the dictionary is simple (S) or complex (C) ● Second letter indicates the size of the dictionary – small (S, approximately 20 words), medium (M, about 4k words), large (L, about 45k words), or huge (H, about 350k words). ● Number of the test, completely arbitrary. ● Next letters indicate what changes are allowed to the words: changing one letter into another (C), modifying the length of a word (L), or swapping letters (P) ● Last letter indicates whether output is word format (W) or modification format (M) For complex dictionaries, the test case name is followed by an underscore and a series of letters. These letters indicate which special characters are included in the dictionary – reverse (R), insert-each (I), double (D), and swap (S). For example, a test case named “CL7QCLW_RIDS” is a complex dictionary, of large size, test number 7, queue mode, change and length modes, word output, and the complex dictionary was produced using all 4 modifications (R, I, D, S). 16 Some test cases don’t follow this naming scheme. Any test case starting with “INV” means that your solution should exit 1 because of some sort of INValid input. The test cases starting with “Spec” use the first example dictionary in the spec (page 2), the test cases starting with “SpecComp” use the complex dictionary in the spec (page 3), and the test cases starting with “AppA” use the dictionary from Appendix A. The test files that you submit are run against our correct solution, your solution, and our “buggy” solutions. If the output of your solution differs from our “correct” solution, we will display both your output and our output for the first such test file encountered. The test files section of the GA feedback will display the number of bugs, and how many “found” bugs will earn points. 17
EECS 281 –Programming Project 2 Mine All Mine (Priority Queues)
Project Identifier Overview Project Goals Part A: Gold Mining Overview Breaking Out of the Mine Example TNT Explosions Command Line Input Output Full I/O Example Input (spec-M.txt and spec-R.txt): Output: PART B: Priority Queue Implementation Eecs281PQ<> interface Implementing the Priority Queues Compiling and Testing Priority Queues Logistics The std::priority_queue<> About Comparators Version 09-29-20 Credits: David Paoletti, Marcus Darden, Ian Lai, Spencer Kim, Brian Wang, Nathan Moos, Katie Matton © 2020 Regents of the University of Michigan 1 Libraries and Restrictions Testing and Debugging Test File Details Submitting to the Autograder Grading Appendix A: Printing out the Grid Appendix B: Tips (mostly PairingPQ, but others also) Appendix C: Autograder Information Project Identifier You MUST include the project identifier at the top of every file you submit to the autograder as a comment. This includes all source files, header files, and your Makefile (in the first TODO block): // Project Identifier: 19034C8F3B1196BF8E0C6E1C0F973D2FD550B88F Overview This project is broken into two parts (A and B). Part A is your main(), using the STL std::priority_queue<>; part B is you implementing several different versions of a priority queue. Project Goals ● Understand and implement several kinds of priority queues. ● Be able to independently read, learn about, and implement an unfamiliar data structure. ● Be able to develop efficient data structures and algorithms. ● Implement an interface that uses templated “generic” code. ● Implement an interface that uses inheritance and basic dynamic polymorphism. ● Become more proficient at testing and debugging your code. Part A: Gold Mining For Part A, you should always use std::priority_queue<>, not your templates from Part B. In Part A you probably do not need or want to use pointers; in Part B you will have to (for the Pairing Heap). © 2020 Regents of the University of Michigan 2 Overview You are an adventurous gold miner. However, in your avarice you’ve ignored several safe-tunneling practices, and a recent earthquake has left you trapped! Luckily, out of paranoia, you always carry ridiculous quantities of dynamite sticks with you. You need to blast your way out and make the most of each dynamite stick by blasting the piles of rubble which take the fewest sticks to destroy. Breaking Out of the Mine The mine you are trapped in can be represented with a 2-dimensional grid. There are 2 types of tiles: ● Tiles containing rubble. ○ Think of cleared tiles as containing 0 rubble. A tile in the mine could contain 0 before you clear it, that means that it never contained rubble. ● Tiles containing TNT. You (the miner) start on a specified tile. At every iteration, you will attempt to blast away the “easiest” tile you can “access”, until you escape! Definition of Discovered The mine is very dark and you cannot see very far. When standing on a clear tile there are only four tiles that you can discover from your current tile: up, down, left and right (except for edge-of-map conditions). This is true in every case except for the start of the simulation, when you can only discover the starting tile. Adding tiles to the primary priority queue counts as “discovering” them. They can never be discovered (added to the primary PQ) twice. Definition of Investigating The priority queue will always tell you what your “next” tile will be. Investigating is taking the “next” tile from the priority queue and making it your “current” location. When you investigate a tile, you must clear it if it contains rubble or TNT. After clearing the tile, you can then discover any of the four tiles visible from your current location, add them to the priority queue, etc. You use the priority queue to remember tiles that you have discovered, but have not yet investigated. Definition of Easiest Tile The easiest tile you can reach is defined as follows, in the stated order: 1. Any TNT tile you can reach. 2. The lowest rubble-value tile you can reach. Tie-breaking In the event of ties (two TNT tiles or two rubble tiles with the same rubble-value): 1. Investigate the tile with the lower column coordinate first. 2. If the tiles have the same column coordinate, investigate the tile with the lower row coordinate. Clearing Tiles © 2020 Regents of the University of Michigan 3 When clearing away rubble tiles, the tile turns into a cleared tile. When clearing away TNT tiles, the following happens: ● The TNT tile becomes a cleared tile. ● All tiles touching the TNT tile are also “cleared” ○ If a TNT tile is touching another TNT tile, this will cause a chain explosion. ○ Diagonals are not considered for TNT explosions. Definition of Escape The miner escapes when their current tile has been cleared and is at the edge of the grid. Example In the following example, you start at position [1,2] (row 1, column 2). The tiles that the miner has investigated are red and the tiles that the miner has discovered are blue. The tile that the miner just cleared is red and underlined. Note: all cleared tiles will be 0, because you must clear a tile as part of investigating it. Positive integers signify rubble tiles (0 is a clear tile) and the value -1 signifies a TNT tile. Please note: This example mine is for illustrative purposes and is not the same as the input file described in the Full I/O Example section. To make things clearer, there are bold indices on the edges that refer to row and column number – they are not a part of the actual input file. 0 1 2 3 4 0 20 20 20 20 20 1 20 100 10 20 15 2 20 15 5 0 20 3 20 20 0 -1 100 4 100 -1 -1 20 20 At the first iteration, the only tile that can be discovered is the starting location, [1,2]. The miner clears this tile and then can discover other tiles. 0 1 2 3 4 0 20 20 20 20 20 1 20 100 0 20 15 2 20 15 5 0 20 3 20 20 0 -1 100 4 100 -1 -1 20 20 Next, the miner will investigate and clear tile [2,2] because there are no TNT tiles in view and it has the lowest rubble-value. Clearing that tile allows us to discover more tiles! © 2020 Regents of the University of Michigan 4 0 1 2 3 4 0 20 20 20 20 20 1 20 100 0 20 15 2 20 15 0 0 20 3 20 20 0 -1 100 4 100 -1 -1 20 20 The tiles [2,1], [2, 3], and [3,2] are now discoverable (but still uninvestigated). The miner then investigates tile [3,2]. Both this tile and tile [2,3] have an equally low level of difficulty. The tile [3,2] is chosen because its lower column value of 2 breaks the tie. 0 1 2 3 4 0 20 20 20 20 20 1 20 100 0 20 15 2 20 15 0 0 20 3 20 20 0 -1 100 4 100 -1 -1 20 20 At this point, there are two TNT tiles which could be investigated next! However, due to the tie-breaking rules, the miner will choose to blow up the TNT at [4,2] instead of [3,3] (this choice is made because it is at the top of the priority queue). Blowing up the TNT tile at [4,2] clears all the tiles touching it, creating a chain reaction with the TNT tile at [4,1]. After all the TNT explosions have been resolved, the grid looks like the following: 0 1 2 3 4 0 20 20 20 20 20 1 20 100 0 20 15 2 20 15 0 0 20 3 20 0 0 -1 100 4 0 0 0 0 20 An explosion makes tiles discovered but does not investigate the tile (or discover around it). So, for example, [3, 1] was blown up by TNT and thus discovered. Therefore it is a candidate to be investigated next, but [3, 0] (adjacent to it) is not. Since the current location is [4,2], and this tile is now cleared and on the edge of the map, we have escaped! TNT Explosions As you will see in the Output section, you need to keep track of the order in which tiles are cleared. © 2020 Regents of the University of Michigan 5 When a TNT tile detonates, all tiles that are cleared as a result of the TNT detonation (including chain reactions) are cleared in order from “easiest” to “hardest” (as defined in Breaking Out of the Mine, including tie-breaker rules). These tiles should also be discovered. As stated in Breaking Out of the Mine, do NOT consider diagonals; even TNT only destroys rubble piles up, down, left and right of it. When a TNT detonation occurs, you should use a priority queue to determine detonation order. Push all the detonated tiles into a separate TNT priority queue, and then pop them out in priority order. You need to use some type of priority queue, because TNT blows up other TNT first, followed by smaller piles of rubble, etc. Notice that, as you progress through your TNT priority queue, you may blow up a tile that is already waiting in your primary priority queue. If this happens, you have to make sure that the new rubble value of 0 comes out of the primary PQ sooner than the old, non-0 value. Think about how it will be possible for the primary PQ to actually know that a tile has changed! What if there were two entries for the same tile, and you ignore the second one? See the Full I/O Example for more details. Command Line Your program MineEscape should take the following case-sensitive command-line options: ● -h, –help ○ Print a description of your executable and exit(0). ● -s, –stats N ○ An optional flag that tells the program it should print extra summarization statistics upon termination. This optional flag has a required argument N. Details are covered in the Output section. ● -m, –median ○ An optional flag that tells the program it should print the median difficulty of clearing a rubble tile that the miner has seen. Details are covered in the Output section. ● -v, –verbose ○ An optional flag that tells the program it should print out every rubble value (or TNT) as a tile is being cleared. Details are covered in the Output section. Examples of legal command lines: ● ./MineEscape -v < infile.txt ● ./MineEscape –stats 15 < infile.txt > outfile.txt We will not be error checking your command-line handling, but we expect your program to accept any valid combination of input parameters. Input and Output Redirection In order to read an input file, you will use input redirection, just like you did in project 1. If you want to send your output to a file, you can also use output redirection, as seen in one of the command line © 2020 Regents of the University of Michigan 6 examples above. The < redirects the file specified by the next command line argument to be the standard input (stdin/cin) for the program. The > redirects the standard output (stdout/cout) of the program to be printed to the file specified by the next command line argument. Input Settings will be given from an input file, ‘MINEFILE’ (this input file will not necessarily be named ‘MINEFILE’). There will be two different types of input: map (M) and pseudo-random (R). Map input is for your personal debugging, but pseudorandom allows easier testing of large grids. Map input mode (M) Input will be formatted as follows: ● ‘M’ – A single character indicating that this file is in map format. ● ‘Size’ – Positive integer number that specifies the size of the square grid (20 means a grid with 20 rows and 20 columns). ● ‘Start’ – Coordinate indicating the starting location, row followed by column (two integers). Map input consists of a map of all the tiles in the mine. Each tile will be separated from other tiles on the same line with whitespace (any number of spaces or tabs). There are 2 types of tiles: ● Rubble tiles which are signified by an integer between 0 and 999 (0 means the tile is already clear of rubble). ● TNT tiles, which are represented by the integer -1. Tiles are indexed as follows: ● The tile in the top left corner is at location [0,0]. ● The tile in the bottom right corner is at location [Size-1,Size-1]. Example of M input (starting location is at [1,2], or row 1 column 2, underlined): M Size: 5 Start: 1 2 9 0 9 3 3 6 9 6 8 3 9 4 1 9 0 2 0 -1 -1 9 8 3 9 7 5 Pseudorandom Mode (R) Input will be formatted as follows: © 2020 Regents of the University of Michigan 7 ● ‘R’ – character indicating that this file is in pseudorandom format. ● ‘Size’ – Number that specifies the size of the square grid (unsigned integer) ● ‘Start’ – Coordinate indicating the starting location (two unsigned integers, row first). ● ‘Seed’ – Number used to seed the random number generator (unsigned integer) ● ‘Max_Rubble’ – The max rubble value a tile can have (unsigned integer) ● ‘TNT’ – Chance that a generated tile will be a TNT tile (20 = 1 in 20 chance of a given tile being a TNT tile, 0 = no chance of TNT; also an unsigned integer) Example of R input: R Size: 5 Start: 1 2 Seed: 0 Max_Rubble: 10 TNT: 5 Generating your grid in R mode: Included in Canvas with the project spec are the files P2random.h and P2random.cpp that contain definitions for the following function: void P2random::PR_init(std::stringstream& ss, unsigned int floor_size, unsigned int seed, unsigned int max_rubble, unsigned int tnt_chance); The function P2random::PR_init(…) will set the contents of the stringstream argument (ss) so that you can use it just like you would cin for M input mode. You may find the following (incomplete) C++ code helpful in reducing code duplication. Remember, DO NOT copy/paste from a PDF file, retype the code by hand! stringstream ss; if (mode == “R”) { // Read some variables from cin P2random::PR_init(ss, floor_size, seed, max_rubble, tnt_chance); } // if // If map mode is on, read from cin; // otherwise read from the stringstream istream &inputStream = (mode == “M”) ? cin : ss; From this point on, read from inputStream; it doesn’t matter whether the mode is M or R! © 2020 Regents of the University of Michigan 8 The example R and M input files given above are equivalent. That is, both should generate the exact same map! Errors you must check for You must make sure that the input file describes a valid mine by checking for each of the following: ● The character on the first line of the file is either a ‘M’ or an ‘R’ ● The coordinate specifying the ‘Start’ is a valid location given the ‘Size’ of the grid If you detect invalid input at any time during the program, print a helpful message to cerr and exit(1). You do not need to check for input errors not explicitly mentioned here. Look in the Part A samples file, Error_messages.txt: if your program performs an exit(1) and produces one of those error messages for a valid test case, we’ll show you your error output (to help you debug). Output Default Output After completing the escape, your program should always print a summary line: Cleared tiles containing rubble and escaped. the number of tiles cleared. This number does include tiles cleared by TNT, but does not include the TNT tile itself. the total rubble cleared. This number does include rubble cleared by TNT, but clearing (detonating) the TNT tile itself counts as 0 rubble cleared. Verbose Output During the program execution, if the -v or –verbose switch is present, each time you clear a tile whose rubble amount is greater than 0, you should print: Cleared: at [,] the amount of rubble being cleared the coordinates of the tile being cleared Any TNT tiles blown up should display: TNT explosion at [,]! Any rubble tiles cleared by TNT should add the words “by TNT” to the cleared line; the general format is: © 2020 Regents of the University of Michigan 9 Cleared by TNT: at [,] The verbose mode output is produced as tiles are cleared, before the summary line. Consider getting verbose mode working early: it involves very little code, and will help you debug if you have any incorrect output. Median Output During the program execution, if the -m or –median switch is present, each time you clear a tile, you should print: Median difficulty of clearing rubble is: The median value of rubble cleared so far. This includes rubble tiles cleared by TNT, but does not include TNT tiles (-1 values should not be included in this calculation). Tiles that start out clear (rubble value 0) are not included here. Median output must display 2 decimal digits. You can use: cout << std::fixed << std::setprecision(2); at the beginning of your program to guarantee this. Stats Output After printing the summary line, if the -s or –stats option is specified print the following output (where N is the argument given to the -s flag on the command line): A. The line, “First tiles cleared:” (without quotes) followed by the first N tiles cleared in the following format: at [,] the type of the tile being cleared. If it is a rubble tile, this is the rubble amount. If this is a TNT tile, print “TNT” without the quotes , the coordinates of the tile being cleared. Remember: when a TNT tile detonates or when there is a chain reaction, all the tiles are cleared from easiest to hardest. Refer back to TNT Explosions for more details. B. The line, “Last tiles cleared:” (without quotes) followed by the last N tiles cleared in the same format as part A. The last tile cleared should be printed first, followed by the second last, etc. © 2020 Regents of the University of Michigan 10 C. The line, “Easiest tiles cleared:” (without quotes) followed by the N easiest tiles you blew up in the same format as part A in descending order (easiest tile followed by next easiest tile) D. The line, “Hardest tiles cleared:” (without quotes)’ followed by the N hardest tiles you blew up in the same format as part A. in ascending order (hardest tile followed by next hardest tile) If you have cleared less than N tiles, then simply print as many as you can. Tiles that start out clear (rubble value 0) are not included here. Full I/O Example Input (spec-M.txt and spec-R.txt): Equivalent input files (the R input file will generate a grid that looks just like the M input file): M Size: 5 Start: 1 2 9 0 9 3 3 6 9 6 8 3 9 4 1 9 0 2 0 -1 -1 9 8 3 9 7 5 R Size: 5 Start: 1 2 Seed: 0 Max_Rubble: 10 TNT: 5 Output: We ran the program with this command line: ./MineEscape -v -m -s 10 < spec-M.txt The output is shown below, with the verbose mode highlighted in blue. The median mode output is easy to identify, and statistics come after the “Cleared 6 tiles” summary line. Cleared: 6 at [1,2] Median difficulty of clearing rubble is: 6.00 Cleared: 1 at [2,2] Median difficulty of clearing rubble is: 3.50 TNT explosion at [3,2]! TNT explosion at [3,3]! Cleared by TNT: 7 at [4,3] Median difficulty of clearing rubble is: 6.00 Cleared by TNT: 9 at [4,2] Median difficulty of clearing rubble is: 6.50 © 2020 Regents of the University of Michigan 11 C l e a r e d b y T N T : 9 a t [ 2 , 3 ] M e d i a n d i f f i c u l t y o f c l e a r i n g r u b b l e i s : 7 . 0 0 C l e a r e d b y T N T : 9 a t [ 3 , 4 ] M e d i a n d i f f i c u l t y o f c l e a r i n g r u b b l e i s : 8 . 0 0 C l e a r e d 6 t i l e s c o n t a i n i n g 4 1 r u b b l e a n d e s c a p e d . F i r s t t i l e s c l e a r e d : 6 a t [ 1 , 2 ] 1 a t [ 2 , 2 ] T N T a t [ 3 , 2 ] T N T a t [ 3 , 3 ] 7 a t [ 4 , 3 ] 9 a t [ 4 , 2 ] 9 a t [ 2 , 3 ] 9 a t [ 3 , 4 ] L a s t t i l e s c l e a r e d : 9 a t [ 3 , 4 ] 9 a t [ 2 , 3 ] 9 a t [ 4 , 2 ] 7 a t [ 4 , 3 ] T N T a t [ 3 , 3 ] T N T a t [ 3 , 2 ] 1 a t [ 2 , 2 ] 6 a t [ 1 , 2 ] E a s i e s t t i l e s c l e a r e d : T N T a t [ 3 , 2 ] T N T a t [ 3 , 3 ] 1 a t [ 2 , 2 ] 6 a t [ 1 , 2 ] 7 a t [ 4 , 3 ] 9 a t [ 4 , 2 ] 9 a t [ 2 , 3 ] 9 a t [ 3 , 4 ] H a r d e s t t i l e s c l e a r e d : 9 a t [ 3 , 4 ] 9 a t [ 2 , 3 ] 9 a t [ 4 , 2 ] 7 a t [ 4 , 3 ] 6 a t [ 1 , 2 ] 1 a t [ 2 , 2 ] T N T a t [ 3 , 3 ] T N T a t [ 3 , 2 ] © 2 0 2 0 R e g e n t s o f t h e U niv e r sit y o f Mic hig a n 1 2 PART B: Priority Queue Implementation For this project, you are required to implement and use your own priority queue containers. You will implement a “sorted array priority queue”, a “binary heap priority queue”, and a “pairing heap priority queue” which implement the interface defined in Eecs281PQ.h. To implement these priority queues, you will need to fill in separate header files, SortedPQ.h, BinaryPQ.h, and PairingPQ.h, containing all the definitions for the functions declared in Eecs281PQ.h. We have provided these files with empty function definitions for you to fill in. You may also add any additional functions needed to maintain the priority queue. We provide a very bad priority queue implementation called the “Unordered priority queue” in UnorderedPQ.h, which does a linear search for the most extreme element each time it is requested. You can look at the code in this priority queue to see the use of this->compare() (more on that below), how to write your constructors, etc. There’s also an UnorderedFastPQ.h that is faster; look at how it uses mutable to accomplish that.. You can also use this priority queue to ensure that your other priority queues are returning elements in the correct order. All priority queues should return elements in the same (priority) order no matter which implementation is being used. These files specify more information about each priority queue type, including runtime requirements for each member function and a general description of the container. You are not allowed to modify Eecs281PQ.h in any way. Nor are you allowed to change the interface (names, parameters, return types) that we give you in any of the provided headers. You are allowed to add your own private helper functions and variables to the other header files as needed, so long as you still follow the requirements outlined in both the spec and the comments in the provided files. These priority queues can take in an optional comparison functor type, COMP_FUNCTOR. Inside the classes, you can access an instance of COMP_FUNCTOR with this->compare(). All of your priority queues must default to be MAX priority queues. This means that if you use the default comparison functor with an integer PQ, std::less, the PQ will return the largest integer when you call top(). Here, the definition of max (aka most extreme) is entirely dependent on the comparison functor. For example, if you use std::greater, it will become a min-PQ. The definition is as follows: If A is an arbitrary element in the priority queue, and top() returns the “most extreme” element. this->compare(top(), A) should always return false (A is “less extreme” than top()). It might seem counterintuitive that std::less<> yields a max-PQ, but this is consistent with the way that the STL std::priority_queue<> works (and other STL functions that take custom comparators, like sort()). © 2020 Regents of the University of Michigan 13 We will compile your priority queue implementations with our own code to ensure that you have correctly and fully implemented them. To ensure that this is possible (and that you do not lose credit for these tests), do not define a main function in one of the PQ headers, or any header file for that matter. Eecs281PQ<> interface Functions: push(const TYPE& val) //inserts a new element into the priority //queue top() //returns the highest priority element in the //priority_queue pop() //removes the highest priority element from //the priority queue size() //returns the size of the priority queue empty() //returns true if the priority queue is //empty, false otherwise Unordered Priority Queue The unordered priority queue implements the priority queue interface by maintaining a vector. This has already been implemented for you, and you can use the code to help you understand how to use the comparison functor, etc. Complexities and details are in UnorderedPQ.h and UnorderedFastPQ.h. Sorted Priority Queue The sorted priority queue implements the priority queue interface by maintaining a sorted vector. Complexities and details are in SortedPQ.h. This should be written almost entirely using the STL. The number of lines of code needed to get this working is fairly low, generally <= 10. Binary Heap Priority Queue Binary heaps will be covered in lecture. We also highly recommend reviewing Chapter 6 of the CLRS book. Complexities and details are in BinaryPQ.h. One issue that you may encounter is that the examples and code in the slides use 1-based indexing, but your code must store the values in a vector (where indexing starts at 0). There are three possible solutions to this problem: 1) Add a “dummy” element at index 0, make sure you never let them access it, and make sure that size() and empty() work properly. 2) Translate the code from 1-based to 0-based. This is the best solution but the hardest. 3) Use a function to translate indices for you. Instead of accessing data[i], use getElement(i). The code for getElement() is given below (both versions are needed). © 2020 Regents of the University of Michigan 14 // Translate 1-based indexing into a 0-based vector TYPE &getElement(std::size_t i) { return data[i – 1]; } // getElement() const TYPE &getElement(std::size_t i) const { return data[i – 1]; } // getElement() Pairing Priority Queue Pairing heaps are an advanced heap data structure that can be quite fast. In order to implement the pairing priority queue, read the two papers we provide you describing the data structure. Complexity details can be found in PairingPQ.h. We have also included a couple of diagrams that may help you understand the tree structure of the pairing heap. Below is the pairing heap modeled as a tree, in which each node is greater than each of its children: To implement this structure, the pairing heap will use child and sibling pointers to have a structure like this: © 2020 Regents of the University of Michigan 15 Implementing the Priority Queues Look through the included header files: you need to add code in SortedPQ.h, BinaryPQ.h, and PairingPQ.h, and this is the order that we would suggest implementing the different priority queues. Each of these files has TODO comments where you need to make changes. We wanted to provide you with files that would compile when you receive them, so some of the changes involve deleting and/or changing lines that were only placed there to make sure that they compile. For example, if a function was supposed to return an integer, NOT having a return statement that returns an integer would produce a compiler error. Also, functions which accept parameters have had the name of the parameter commented out (otherwise you would get an unused parameter error). Look at UnorderedPQ.h as an example, it’s already done. When you implement each priority queue, you cannot compare data yourself using the < operator. You can still use < for comparisons such as a vector index to the size of the vector, but you must use the provided comparator for comparing the data stored inside your priority queue. Notice that Eecs281PQ.h contains a member variable named compare of type COMP (one of the templated class types). Although the other classes inherit from Eecs281PQ.h, you cannot access the compare member directly, you must always say this->compare() (this is due to a template inheriting from a template). Notice that in UnorderedPQ<> it uses this->compare() by passing it to the max_element() algorithm to use for comparisons. When you write the SortedPQ<> you cannot use binary_search() from the STL, but you wouldn’t want to: it only returns a bool to tell you if something is already in the container or not! Instead use the lower_bound() algorithm (which returns an iterator), and you can also use the sort() algorithm — you don’t have to write your own sorting function. You do however have to pass the this->compare functor to both lower_bound() and sort(). The BinaryPQ<> is harder to write, and requires a more detailed and careful use of the comparison functor, and you have to know how one works to write one in the first place, even for UnorderedPQ<> to use. See the About Comparators section below. © 2020 Regents of the University of Michigan 16 Compiling and Testing Priority Queues You are provided with a testing file, testPQ.cpp. testPQ.cpp contains examples of unit tests you can run on your priority queues to ensure that they are correct; however, it is not a complete test of your priority queues; for example it does not test updatePriorities(). It is especially lacking in testing the PairingPQ<> class, since it does not have any calls to addNode() or updateElt(). You should add more tests to this source code file. Using the 281 Makefile, you can compile testPQ.cpp by typing in the terminal: make testPQ. You may use your own Makefile, but you will have to make sure it does not try to compile your driver program as well as the test program (i.e., use at your own risk). Logistics The std::priority_queue<> The STL std::priority_queue<> data structure is basically an efficient implementation of the binary heap which you are also coding in BinaryPQ.h. To declare a std::priority_queue<> you need to state either one or three types: 1) The data type to be stored in the container. If this type has a natural sort order that meets your needs, this is the only type required. 2) The underlying container to use, usually just a std::vector<> of the first type. 3) The comparator to use to define what is considered the highest priority element. If the type that you store in the container has a natural sort order (i.e. it supports operator<()), the std::priority_queue<> will be a max-heap of the declared type. For example, if you just want to store integers, and have the largest integer be the highest priority: std::priority_queue pqMax; When you declare this, by default the underlying storage type is vector and the default comparator is less. If you want the smallest integer to be the highest priority: std::priority_queue<int, vector, greater> pqMin; If you want to store something other than integers, define a custom comparator as described below. About Comparators The functor must accept two of whatever is stored in your priority queue: if your PQ stores integers, the functor would accept two integers. If your PQ stores pointers to units, your functor would accept two pointers to orders (actually two const pointers, since you don’t have to modify units to compare them). © 2020 Regents of the University of Michigan 17 Your functor receives two parameters, let’s call them a and b. It must always answer the following question: is the priority of a less than the priority of b? What does lower priority mean? It depends on your application. For example, refer back to the Breaking Out of the Mine section: if you have multiple tiles in your priority queue, you determine priority based on smallest rubble value. If rubble value is the same, break ties based on column or row number. When you would have wanted to write a comparison, such as: if (data[i] < data[j]) You would instead write: if (this->compare(data[i], data[j]) Your priority queues must work in general. In general, a priority queue has no idea what kind of data is inside of it. That’s why it uses this->compare() instead of <. What if you wanted to perform the comparison if (data[i] > data[j])? Use the following: if (this->compare(data[j], data[i]) Libraries and Restrictions For part A, we encourage the use of the STL, with the exception of these prohibited features: ● The thread/atomics libraries (e.g., boost, pthreads, etc) which spoil runtime measurements. ● Smart pointers (both unique and shared). You are allowed to use std::vector<>, std::priority_queue<> and std::deque<>. You are not allowed to use other STL containers. Specifically, this means that use of std::stack<>, std::queue<>, std::list<>, std::set<>, std::map<>, std::unordered_set<>, std::unordered_map<>, and the ‘multi’ variants of the aforementioned containers are forbidden. For part B, You are allowed to use std::sort(). You are allowed to use std::lower_bound(). You are not allowed to use std::partition(), std::partition_copy(), std::partial_sort(), std::stable_partition(), std::make_heap(), std::push_heap(), std::pop_heap(), std::sort_heap(), std::qsort(), or anything that trivializes the implementation of the binary heap. © 2020 Regents of the University of Michigan 18 You are not allowed to use the C++14 regular expressions library (it is not fully implemented on gcc 6.2) or the thread/atomics libraries (it spoils runtime measurements). You are not allowed to use other libraries (eg: boost, pthread, etc). Furthermore, you may not use any STL component that trivializes the implementation of your priority queues (if you are not sure about a specific function, ask us). Your main program (part A) must use std::priority_queue<>, but your PQ implementations (part B) must not. Testing and Debugging Part of this project is to prepare several test files that will expose defects in the program. We strongly recommend that you first try to catch a few of our buggy solutions with your own test files, before beginning your solutions. This will be extremely helpful for debugging. The autograder will also tell you if one of your own test files exposes bugs in your solution. Test File Details Your test files must be valid Map mode input files. We will run your test files on several buggy project solutions. If your test file causes a correct program and the incorrect program to produce different output, the test file is said to expose that bug. Test files should be named test-n-.txt where 0 < n <= 10. The portion of the name should contain at least one of ‘m’ (median), ‘s’ (statistics), and/or ‘v’ (verbose). You must specify at least one flag; you can specify two or three if you want. If the ‘s’ flag is specified, the autograder will pick the number of statistics based on the test number (a larger value of n will generate more statistics). For example, test-1-vs.txt is a valid file name. Your test files must be in map input mode and cannot have a size larger than 15. You may submit up to 10 test files (though it is possible to get full credit with fewer test files). The tests the autograder runs on your solution are NOT limited to having a Size of 15; your solution should not impose any size limits (as long as sufficient system memory is available). However you can assume that an int (signed or unsigned) can hold the size, row or column number, amount of rubble, etc. © 2020 Regents of the University of Michigan 19 Submitting to the Autograder Do all of your work (with all needed files, as well as test files) in some directory other than your home directory. This will be your “submit directory”. You can make two separate projects inside of your IDE: one for part A, another for part B. Before you turn in your code, be sure that: ● Every source code and header file contains the following project identifier in a comment at the top of the file: // Project Identifier: 19034C8F3B1196BF8E0C6E1C0F973D2FD550B88F ● The Makefile must also have this identifier (in the first TODO block). ● DO NOT copy the above identifier from the PDF! It might contain hidden characters. Copy it from the README file instead (this file is included on Canvas). ● You have deleted all .o files and any executables. Typing ‘make clean’ should accomplish this. ● Your makefile is called Makefile. Typing ‘make -R -r’ builds your code without errors and generates an executable file called MineEscape. The command line options -R and -r disable automatic build rules, which will not work on the autograder. ● Your Makefile specifies that you are compiling with the gcc optimization option -O3. This is extremely important for getting all of the performance points, as -O3 can often speed up execution by an order of magnitude. You should also ensure that you are not submitting a Makefile to the autograder that compiles with -g, as this will slow your code down considerably. If your code “works” when you don’t compile with -O3 and breaks when you do, it means you have a bug in your code! ● Your test files are named test-n-MODE.txt and no other project file names begin with test. Up to 10 test files may be submitted. ● The total size of your solution and test files does not exceed 2MB. ● You don’t have any unnecessary files (including temporary files created by your text editor and compiler, etc) or subdirectories in your submit directory (i.e. the .git folder used by git source code management). ● Your code compiles and runs correctly using version 6.2.0 of the g++ compiler. This is available on the CAEN Linux systems (that you can access via login.engin.umich.edu). Even if everything seems to work on another operating system or with different versions of GCC, the course staff will not support anything other than GCC 6.2.0 running on Linux (students using other compilers and OS did observe incompatibilities). In order to compile with g++ version 6.2.0 on CAEN you must put the following at the top of your Makefile (or use the one that we’ve provided): PATH := /usr/um/gcc-6.2.0/bin:$(PATH) LD_LIBRARY_PATH := /usr/um/gcc-6.2.0/lib64 LD_RUN_PATH := /usr/um/gcc-6.2.0/lib64 © 2020 Regents of the University of Michigan 20 Turn in all of the following files: ● All your .h and .cpp files for the project (solution and priority queues) ● Your Makefile ● Your test files You must prepare a compressed tar archive (.tar.gz file) of all of your files to submit to the autograder. One way to do this is to have all of your files for submission (and nothing else) in one directory. Go into this directory and run one of these two commands (assuming you are using our Makefile): make fullsubmit (builds a “tarball” named fullsubmit.tar.gz that contains all source and header files, test files, and the Makefile; this file is to be submitted to the autograder for any completely graded submission) make partialsubmit (builds a “tarball” named partialsubmit.tar.gz that contains only source and header files, and the Makefile; test files are not included, which will speed up the autograder by not checking for bugs; this should be used when testing the simulation only) For Part A, you can submit test files (map files), for Part B you only submit code. These commands will prepare a suitable file in your working directory. Submit your project files directly to either of the two autograders at: https://g281-1.eecs.umich.edu/ or https://g281-2.eecs.umich.edu/. You can safely ignore and override any warnings about an invalid security certificate. When the autograders are turned on and accepting submissions, there will be an announcement on Piazza. The autograders are identical and your daily submission limit will be shared (and kept track of) between them. You may submit up to 2 times per calendar day, per part, with autograder feedback (more in Spring). For this purpose, days begin and end at midnight (Ann Arbor local time). We will use your best submission when running final grading. Part of programming is knowing when you are done (when you have achieved your task and have no bugs); this is reflected in this grading policy. We strongly recommend that you use some form of revision control (ie: SVN, GIT, etc) and that you ‘commit’ your files every time you upload to the autograder so that you can always retrieve an older version of the code as needed. Please refer to your discussion slides and Canvas regarding the use of version control. If you use late days on this project, it is applied to both parts. So if you use one late day for Part A, you automatically get a 1-day extension for Part B, but are only charged for the use of one late day. Please make sure that you read all messages shown at the top section of your autograder results! These messages will help explain some of the issues you are having (such as losing points for having a bad Makefile). Also be sure to check whether the autograder shows that one of your own test files exposes a bug in your solution (at the bottom of the student test files section). Watch for the word “Hint” in the autograder feedback about specific test cases. © 2020 Regents of the University of Michigan 21 Grading ● 60 pts total for part A ○ 45 pts — correctness & performance ○ 5 pts — memory leaks ○ 10 pts — student-provided test files ● 40 pts total for part B ○ 20 pts — pairing heap correctness & performance ○ 5 pts — pairing heap memory leaks ○ 10 pts — binary heap correctness & performance ○ 5 pts — sorted heap correctness & performance We also reserve the right to deduct up to 5 points for bad programming style, code that is unnecessarily duplicated, etc.. Refer to the Project 1 spec for details about what constitutes good/bad style, and remember: It is extremely helpful to compile your code with the gcc options (default in our Makefile): -Wall -Werror -Wextra -Wvla -pedantic. This will help you catch bugs in your code early by having the compiler point out when you write code that is either of poor style or might result in unintended behavior. Appendix A: Printing out the Grid While this is never required for the project assignment, you may find it helpful to be able to print out the state of the grid in a human readable format. Unfortunately, the values of rubble may have a different number of digits, which can make it hard to read. For example, consider the following grid: 10 5 100 200 400 200 1 1 -1 It is fairly hard to see which tiles touch which other tiles because they do not line up well. Luckily, the header contains the definition for the std::setw() stream manipulator. The following code snippet gives a quick example of how to use it: #include #include #include © 2020 Regents of the University of Michigan 22 using namespace std; int main() { vector ints1 = {0, 24, 100, 5}; vector ints2 = {100, 2, 40, 2}; for(auto i : ints1) cout << setw(4) << i; cout << endl; for(auto i : ints2) cout << setw(4) << i; cout << endl; return 0; } // main() Output: 0 24 100 5 100 2 40 2 In the previous project, using iomanip caused a loss of points; in this project it is allowed by the autograder (you will need iomanip to set the digits of precision for the median output). Appendix B: Tips (mostly PairingPQ, but others also) There are helpful videos on the EECS 281 YouTube channel, specifically the running median: https://youtu.be/_T63Cajeyhw (which is used for many different versions of Project 2). Here is a video about the Mine Escape: https://youtu.be/ICgHr6ZbUDk. Here is a video all about Part B: https://youtu.be/kOqzBMvxfuw. There’s also a video recorded a few years ago by a student (during office hours) about the Pairing PQ: https://youtu.be/irsHBpw2fhE. Start coding the mine escape first, but keep thinking about the priority queue implementations. When you code them, the suggested order is SortedPQ, BinaryPQ, PairingPQ. When you’re working on statistics output, be efficient. You can use something from this project to make the easiest/hardest tiles cleared more efficient. You will need TWO priority queues in Part A: one for things discovered, and another to handle the TNT chain explosion. Remember, once a TNT explosion starts, it must finish! For Part A, resist the urge to use pointers! If you point to somewhere within a 2D vector, how do you know it’s row and column number for printing? Trust us when we say that you’ll get enough pointers in the PairingPQ. You can always count on the UnorderedPQ.h being correct. If your testPQ.cpp works with this © 2020 Regents of the University of Michigan 23 and doesn’t work with one of your other PQ implementations, it is most likely a bug in that PQ implementation. Make sure ALL of your constructors initialize ALL of your member variables (but objects like a vector might be fine with just being default-constructed). This is most likely to be a problem with Pairing, but check the others also. Even if you think it’s working, run your Pairing Heap with valgrind to check for memory errors and/or leaks. This is good advice for the entire project. Verbose mode doesn’t require a lot of extra work and can make it easier to debug your program. For example, if a tile gets blown up in the wrong order, it would be very useful to know that right away, rather than wondering why the summary line has the wrong number of tiles or rubble cleared. When you are writing a derived class and want to use the compare member variable (that is already declared in the base class), you must use this->compare. This is because you are working on a templated class that inherits from a templated class (referred to in C++ terms as a dependent context). Your comparator should always answer the same question! If called with two parameters (let’s say a, b), it should return true if: priority(a) < priority(b). Make sure that each of your PQ implementation files contains every #include needed by that file! You might be including deque from your main.cpp, and counting on that file existing for PairingPQ.h. You would then be unable to compile when we test your PQ implementations with our .cpp file. The rest of these tips are specific to the Pairing Heap. After you’re done reading about the Pairing Heap data structure, plan on several days of coding, testing, and optimizing JUST for PairingPQ.h. DON’T USE RECURSION, it’s very easy to have a large pairing heap that causes a stack overflow. You can add other private member variables and functions, such as the current size and a meld() function. You are not allowed to change from sibling and child pointers to a deque or vector of children pointers. You don’t want to: this is slower than using sibling/child., and the autograder timings are based on the sibling/child approach. You are allowed to add one more pointer to the Node structure. You can use either a parent (pointing up) or a previous (points left except leftmost points to parent). You can also add a public function to return it if you want to. We didn’t put this variable or function in the starter file because some students want parent, some want previous. Neither is strictly a better choice than the other; each makes some part of Pairing harder, some part easier; both can pass the timings. Pointers passed to meld() must each point to the “root” node of a Pairing Heap. Root nodes never © 2020 Regents of the University of Michigan 24 have siblings! This is important to making meld() as simple and as fast as it should be. Don’t write a copy constructor and copy the code for operator=()! Use the copy-swap method outlined in the “Arrays and Containers” lecture. You are allowed to write an extra function, such as print() to display the entire pairing heap, and only use it for testing. When you need to traverse an entire pairing heap (print, copy constructor, destructor, etc.), think about the Project 1 approach! Use a deque, add the “starting location”, while it’s not empty take the “next” one out, add things nearby, do something with the “next” one that you took out, etc. Appendix C: Autograder Information The test cases for Part A on the autograder (that show in the table) are for part A, and have the following naming convention: ● “INV”: ○ The test case is an invalid input file. Your solution should exit(1) because of an error. ● First letter (for anything not invalid): ○ ‘M’ (medium), ‘L’ (large), and ‘XL’ (extra-large) denote the size of the test case. ○ ‘S’ indicates that the test is the one given in this spec (see Full I/O Example). ○ ‘E’ indicates that the test case is some sort of hand-written edge case. ○ ‘R’ indicates that the grid contains only rubble (possibly 0s, but never any TNT).. ○ ‘N’ indicates that the grid contains no rubble or TNT (i.e. it’s all 0s). ○ ‘T’ indicates that the grid is made entirely of TNT squares. ● Second letter: ○ ‘M’ denotes map input format and ‘R’ denotes pseudo-random input format. ● Lower case letters after a hyphen: ○ ‘m’: The test is run with the median flag on. ○ ‘v’: The test is run with the verbose flag on. ○ ‘s’: The test is run with the statistics flag on. When you start submitting test files to the autograder, it will tell you (in the section called “Scoring student test files”) how many bugs exist, the number needed to start earning points, and the number needed for full points. It will also tell you how many are needed to start earning an extra submit/day! For Part B, when your priority queues (SortedPQ, BinaryPQ, and PairingPQ) are compiled with our main(), we will perform unit testing on your data structures. These test cases all have three-letter names: 1) First letter: Binary PQ, Sorted PQ, Pairing PQ 2) Second letter: Push, Range, Update priorities, Addnode, updateElt, Copy constructor, Operator = © 2020 Regents of the University of Michigan 25 3) Third letter: Small, Medium, Large, eXtra-large A “push” test uses the .push() member function to insert numerous values into your priority queue. After that, the values will be checked via .top() and .pop() until the container is empty, to make sure that every value came out in the correct order. If the “push” test goes over on time, it might be the fault of your .pop(), not your .push(), because both must be called to verify that your container works properly. A “range” test uses the range-based constructor, from [start, end) to insert values into your container, then the test proceeds as described above for the “push” test. The start iterator is inclusive while the end is exclusive, as is normal for the STL. The “update priorities” tests use .push() to fill your container, then half of the values that were given to you are modified (hint, this is accomplished with pointers). After .updatePriorities() is called, all of the values are popped out and tested as above. The first three tests above are run for every priority queue type. The ones below are run only on the PairingPQ. The “addNode” tests use .addNode() to fill the container instead of .push(), and every value is checked through the returned pointer to make sure that it matches. The “updateElt” tests use .addNode() to fill the container, then half of the values have their priority increased by a random amount using .updateElt(). After that, values are popped off one at a time, checking to make sure that each value is correct. The “copy constructor” and “operator=” tests first fill one PairingPQ using .push(). Then they use the stated method to create a second PairingPQ from the first. Lastly every value is popped from both priority queues, making sure that every value is correct (and thus ensuring that a deep copy was performed, not a shallow copy). For the Pairing Heap, the .addNode() member function makes a promise: the item that was added will ALWAYS live at the pointer returned, until the user removes it via .pop() (or the destructor). When implementing .updateElt() and .updatePriorities(), you cannot delete existing nodes and create new ones, as this would break the promise made by .addNode()! You must move nodes around.
EECS 281 Project 3: SillyQL
Overview A relational database is the most common means of data storage and retrieval in the modern age. A relational database model stores data into one or more tables, where each table has columns and rows. The columns represent a value that can be attributed to an entry (such as “color”, “name”, or “ID”) and the rows represent individual entries or records (such as “Paoletti”, “my cat”, or “BBB 1695”). You may find it helpful to think about rows as objects and columns as descriptors for those objects. For instance, the table pictured to the right has each row corresponding to a car (a data type), and the columns group a car’s vendors, model, etc. such that this information can be easily retrieved. Rows in relational databases are also called records or tuples, but in this project we will use the terminology “row” for consistency. Rows in Project 3 are not guaranteed to be unique. However, a database can do much more than simply store information. You must be able to retrieve specific information in an efficient manner. For relational databases, the structured query language (SQL) is a defined method of retrieving information programmatically. It looks like real English, where a valid command for the above table could be “SELECT Model from Cars marketplace where Price < 30000”, which would return a list of models [Corvette, Corvette, Malibu, Malibu, Malibu]. Note that the version of SQL used in this project is simplified and modified somewhat from a typical query language. For this project, you will write a program to emulate a basic relational database with an interface based on a subset of a standard query language. Your executable will be called silly. You will gain experience writing code that makes use of multiple interacting data structures. The design portion of this project is significant; spend plenty of time thinking about what data structures you will use and how they will interact. Version 2020-10-25 1 Current version by: Cris Noujaim, Shahab Tajik, Ankit Shah, and David Paoletti Originally composed by: David Snider and Genevieve Flaspohler Special thanks to Waleed Khan Table of Contents Overview 1 Table of Contents 2 Learning Goals 3 Project Specification 3 Program Arguments 3 User Commands 3 Table Manipulation Commands 4 CREATE – add a new table to the database 4 INSERT INTO – insert new data into table 4 DELETE FROM – delete specific data from table 5 GENERATE INDEX – create a search index on the specified column 6 PRINT – print specified rows 7 JOIN – join two tables and print result 8 REMOVE – remove existing table from the database 10 QUIT – exit the program 10 # – comment / no operation (useful for adding comments to command files) 10 Error Checking 11 Testing and Debugging 11 Submission to the autograder 12 Libraries and Restrictions 13 Grading 13 Checkpoint 13 Test Cases 14 Appendix A: Example from the Spec 16 Appendix B: Variant Types and You 17 Appendix C: Project Tips, Tricks, and Things to Avoid 18 2 Learning Goals ● Selecting appropriate data structures for a given problem. Now that you know how to use various abstract data types, it’s important that you are able to evaluate which will be the optimal for a set of tasks. ● Evaluating the runtime and storage tradeoffs for storing and accessing data contained in multiple data structures. ● Designing a range of algorithms to handle different situations ● Evaluating different representations of the same data. Project Specification Program Arguments silly should accept the following (both optional) command line arguments: ● -h, –help This causes the program to print a helpful message about how to use the program and then immediately exit(0). ● -q, –quiet This causes the program to run in quiet mode. In quiet mode, your program will run very similarly to normal, except that it will print only numerical summaries of the rows affected by the command, not any of the actual data. Quiet mode exists so that we may stress test your program without overloading the autograder with too much output. This flag only affects the JOIN and PRINT commands and specific instructions for quiet mode output is given for these commands. Otherwise, if there is no mention of quiet mode with respect to a given piece of output, you may assume it is always printed. You can implement this feature last; with a well-built project, adding this functionality should be very simple. You may find it simpler to check for these flags directly rather than use getopt. User Commands On startup, your program should prepare to accept commands from the user. These commands may or may not take the form of redirected input. Your program should print “% ” as a prompt to the user before reading each command. Commands are specified by a command keyword followed by a required space character and then the command-specific arguments where applicable. For example, to delete the rows from table1 where the entries in column name equal “John”, one would input: DELETE FROM table1 WHERE name = John. Appendix A contains an example program illustrating the use of these commands. 3 Table Manipulation Commands In the following commands, output examples are given. These examples are cumulative throughout the table manipulation command part of the spec (i.e. the table created in the example output for the CREATE command is the same table that DELETE is performed on during the delete command, and so on, so you can follow the state of the table throughout the spec). CREATE – add a new table to the database Syntax: CREATE … … Creates a new table with columns (where N > 0). Each column contains data of type and is accessed with the name . Table names and column names are guaranteed to be space-free. No two columns in the same table will have the same name (you do not need to check). Valid data types for coltype are {double, int, bool, string}. This table is initially empty. Output: Print the following on a single line followed by a newline: New table with column(s) … created Possible errors: 1. A table named already exists in the database Given the following as input: CREATE 281class 3 string string bool emotion person Y/N The output should be: New table 281class with column(s) emotion person Y/N created INSERT INTO – insert new data into table Syntax: INSERT INTO ROWS … … … … Inserts new rows (where N > 0) into the table specified by . The number of values in each line after the first, or M in the example, is guaranteed to be equal to the number of columns in the table. The first value, , should be inserted into the first column of the table in the first inserted row, the second value, , into the second column of the table in the first inserted row, and so on. Additionally, the types of the values are guaranteed to be the same as the types of the 4 columns they are inserted into. For example, if the second column of the table contains integers, is guaranteed to be an int. Further, string items are guaranteed to be a single string of whitespace delimited characters (i.e. “foo bar” is invalid, but “foo_bar” is acceptable). Output: Print the message shown below, followed by a newline, where is the number of rows inserted, is the index of the first row added in the table, and is the index of the last row added to the table, 0 based. So, if there were K rows in the table prior to insertion, = K, and = K + N – 1. Added rows to from position to Possible errors: 1. is not the name of a table in the database Given the following as input: INSERT INTO 281class 8 ROWS happy Darden true stressed students false busy office_hours true stressed students true stressed Paoletti true happy Darden true happy Sith true victorious Sith true The output should be: Added 8 rows to 281class from position 0 to 7 DELETE FROM – delete specific data from table Syntax: DELETE FROM WHERE Deletes all rows from the table specified by where the value of the entry in satisfies the operation with the given value . You can assume that will always be of the same type as . For example, to delete all rows from table1 where the entries in column name equal “John”, the command would be: DELETE FROM table1 WHERE name = John. Or, to delete all rows from tableSmall where the entries in column size are greater than 15, the command would be: DELETE FROM tableSmall WHERE size > 15. For simplicity, is strictly limited to the set { <, > , = }. Output (with example): Print the number of rows deleted from the table as shown below, followed by a newline: Deleted rows from 5 Possible errors: 1. is not the name of a table in the database. 2. is not the name of a column in the table specified by Given the following as input: DELETE FROM 281class WHERE person = Darden The output should be: Deleted 2 rows from 281class The search is case sensitive (which makes it easier to code): if we had deleted WHERE person = darden, no rows would have been removed. GENERATE INDEX – create a search index on the specified column Syntax: GENERATE FOR INDEX ON Directs the program to create an index of the type on the column in the table , where is strictly limited to the set {hash, bst}, denoting a hash table index and a binary search tree index respectively. Given the hash on column , the program should create a hash table that allows a row in the table to be found rapidly given a particular value in the column . Given the bst on column , the program should create an binary search tree that allows rows in the table to be found rapidly given a particular value in the column . Only one user-generated Index may exist per table, at any one time. If an index is requested on a table that already has one, discard the old index before building the new one. When bst is the specified index type, you should make use of a std::map<>; when hash is the specified index type, you should utilize a std::unordered_map<>. It is acceptable for both types to exist at the same time, but only one (at most) should be in use at any given time (i.e. contain data). An index is a tool used in databases to speed up future commands. A hash index creates a hash table, which associates values in a specified column with a collection of rows in the table for which the index was created. Similarly, a bst index creates a binary search tree which associates values in a specified column with a collection of rows. Both are useful for different types of commands. In order to get the correct output in all cases, you must use an index when it is appropriate. Further, you must remember to update your indices upon edits to the table. bst indices are ordered by operator< for the type in the specified column. Output: Print the message shown below, followed by a newline Created index for table on column Possible errors: 6 1. is not the name of a table in the database. 2. is not the name of a column in the table specified by Given the following as input: GENERATE FOR 281class hash INDEX ON emotion The output should be: Created hash index for table 281class on column emotion Note that in this example, the user program should now have created an index of type hash table that maps from specific emotions of type string to rows in the table 281class,allowing the program to search for all rows where emotion = happy quickly, among other things. PRINT – print specified rows Syntax: PRINT FROM … [WHERE | ALL ] Directs the program to print the columns specified by , , … from some/all rows in . If there is no condition (i.e. statement is of the form PRINT … ALL), the matching columns from all rows of the table are printed. If there is a condition (i.e. statement is of the form PRINT … WHERE ), only rows, whose value pass the condition, are printed. The rules for the conditional portion are the same as for the DELETE FROM statement. It is not guaranteed that the columns in the command are listed in the same order as they exist in the table, nor is it guaranteed that all columns will be listed. The table must be searched as follows to ensure compatibility with the autograder: 1. If no index exists or there is a hash index on the conditional column, the results should be printed in order of insertion into the table. 2. If a bst index exists on the conditional column, the results should be printed in the order in the BST (least item to greatest item for std::map<> constructed with the default std::less<> operator), with ties broken by order of insertion into the table. Output : Print the names of the specified columns, followed by the values of each of the specified columns in each row, separated by space. Every line should be followed by a newline. … … … … Once all the data has been printed, print the following, followed by a newline, where is the number of rows printed: 7 Printed matching rows from In quiet mode, do not print the s or any of the values. Print only the following: Printed matching rows from Possible errors: 1. is not the name of a table in the database. 2. is not the name of a column in the table specified by 3. One (or more ) of the s are not the name of a column in the table specified by (only print the name of the first such column encountered) Given the following as input: PRINT FROM 281class 2 person emotion WHERE Y/N = true The output should be: person emotion office_hours busy students stressed Paoletti stressed Sith happy Sith victorious Printed 5 matching rows from 281class Or in quiet mode: Printed 5 matching rows from 281class JOIN – join two tables and print result Syntax: JOIN AND WHERE = AND PRINT <1|2> <1|2> … <1|2> Directs the program to print the data in columns, specified by , , … . The s will be the names of columns in either the first table or the second table , as specified by the <1/2> argument directly following each . The JOIN command is unique in that it accesses data from multiple tables. The rules for the conditional portion are the same as for the DELETE FROM statement except that for the JOIN command, will be strictly limited to {=} for simplicity. It is not guaranteed that the columns are listed in the same order as they exist in the table, nor is it guaranteed that all columns will be listed. The JOIN must be accomplished as follows in order to insure compatibility with the autograder: 1. Iterate through the first table from beginning to end 8 2. For each row’s respective value in , find matching values in , if any exist 3. For each match found, print the column values in the matching rows in the order specified by the JOIN command 4. The matching rows from the second table must be selected in the order of insertion into that table. 5. If no rows in the second table match a row in the first table, that row is ignored from the join. Output : Print the names of the specified columns, followed by the values of each of the specified columns in each row, separated by space. Every line should be followed by a newline. … … … … Once all the data has been printed, print the following, followed by a newline, where N is the number of rows printed Printed rows from joining to In quiet mode, do not print the s or any of the values. Print only the following: Printed rows from joining to Possible errors: 1. is not the name of a table in the database. 2. One (or more ) of the s or s are not the name of a column in the table specified by (only print the name of the first such column encountered) Given the following as input: CREATE pets 3 string bool bool Name likes_cats? likes_dogs? INSERT INTO pets 2 ROWS Sith true true Paoletti true false JOIN pets AND 281class WHERE Name = person AND PRINT 3 Name 1 emotion 2 likes_dogs? 1 The join specific output should be (Note: the CREATE and INSERT INTO commands will generate their own output, but for simplicity, they are not included in this example): Name emotion likes_dogs? Sith happy true Sith victorious true Paoletti stressed false Printed 3 rows from joining pets to 281class Or in quiet mode: Printed 3 rows from joining pets to 281class 9 Note that the JOIN is case sensitive and does not create a new table. REMOVE – remove existing table from the database Syntax: REMOVE Removes the table specified by and all associated data from the database, including any created index. Output: Print a confirmation of table deletion, followed by a newline, as follows: Table deleted Possible errors: 1. is not the name of a table in the database Given the following as input: REMOVE pets REMOVE 281class The output should be: Table pets deleted Table 281class deleted QUIT – exit the program Syntax: QUIT Cleans up all internal data (i.e. no memory leaks) and exits the program. Note that the program must exit with a return 0 from main(). Output: Print a goodbye message, followed by a newline. Thanks for being silly! Possible errors: None, except for lacking a QUIT command. Every interactive session or redirected input file should end with a QUIT command. # – comment / no operation (useful for adding comments to command files) 10 Syntax: # Any text on a line that begins with # is ignored Discard any lines beginning with #. This command does not produce any output nor can it generate any errors. Error Checking Except for the errors specifically noted above and below, we will not test your error handling. However, we recommend that you implement a robust error handling and reporting mechanism to make your own testing and debugging easier. Normally, you would print error messages to the standard error stream (stderr via cerr), rather than cout. For this project, however, you must print the specified error messages to stdout via cout so that they may be tested in conjunction with the rest of the project. DO NOT exit() when one of these errors occurs, display the error and keep running. As stated in the Table Manipulation Commands section, there are a few specific errors that your code must check for and print the appropriate output. They are as follows: Error 1: A table named already exists in the database Output: Error: Cannot create already existing table Error 2: is not the name of a table in the database Output: Error: does not name a table in the database Error 3: is not the name of a column in the table specified by Output: Error: does not name a column in Error 4: Unrecognized first letter of command (i.e. not one of CREATE, PRINT, REMOVE, #, etc) Output: Error: unrecognized command For all input errors, you should print the matching response, with the braced variables replaced with the items from the offending command and followed by a newline, clear the rest of the command input, and reprompt the user (“% “) as usual. For most commands, clearing the rest of the input line should suffice. If there is an error in the insert command, however, the program should clear as many lines as there are rows to be inserted so that the command is fully flushed. Do not terminate the program. Other than the errors noted above, commands will be well formed. For simplicity, this also holds true when clearing away an erroneous command. Testing and Debugging A major part of this project is to prepare a suite of test files that will help expose defects in your program. Each test file should be a text file containing a series of table manipulation commands to run. Your test files will be run against several buggy project solutions. If your test file causes the correct program and an incorrect program to produce different output, the test file is said to expose that bug. 11 Test files should contain valid SillyQL commands, and should be named test-n-table-commands.txt, where 0 < n ≤ 15. Make sure that every test file ends in a QUIT command. Your test files may contain no more than 35 lines in any one file. You may submit up to 15 test files (though it is possible to get full credit with fewer test files). Note that the tests the autograder runs on your solution are NOT limited to 35 lines in a file; your solution should not impose any size limits (as long as sufficient system memory is available). Submitting to the autograder Do all of your work (with all needed files, as well as test files) in some directory other than your home directory. This will be your “submit directory”. Before you turn in your code, be sure that: ● Every source code and header file contains the following project identifier in a comment at the top of the file: ● // Project Identifier: C0F4DFE8B340D81183C208F70F9D2D797908754D ● The Makefile must also have this identifier (in the first TODO block). ● You have deleted all .o files and your executable(s). Your Makefile should include a rule or rules that cause make clean to accomplish this. ● Your makefile is called Makefile. To confirm that your Makefile is behaving appropriately, check that “make -R -r” builds your code without compiler errors and generates an executable file called silly. (Note that the command line options -R and -r disable automatic build rules, which will not work on the autograder). ● Your Makefile specifies that you are compiling with the gcc optimization option -O3 (This is the letter “O,” not the number “0”). This is extremely important for getting all of the performance points, as -O3 can speed up code by an order of magnitude. ● Your test files have names of the form test-n-table-commands.txt, and no other project file names begin with “test.” Up to 15 test files may be submitted. ● The total size of your program and test files does not exceed 2MB. ● You don’t have any unneeded files in your submit directory. ● Your code compiles and runs correctly using version 6.2.0 of the g++ compiler. This is available on the CAEN Linux systems (that you can access via login.engin.umich.edu). Even if everything seems to work on another operating system or with different versions of gcc, the course staff will not support anything other than gcc 6.2.0 running on Linux. To compile with g++ version 6.2.0 on CAEN you must put the following at the top of your Makefile (or use our provided Makefile): PATH := /usr/um/gcc-6.2.0/bin:$(PATH) LD_LIBRARY_PATH := /usr/um/gcc-6.2.0/lib64 LD_RUN_PATH := /usr/um/gcc-6.2.0/lib64 Turn in the following files: ● All of your .h and .cpp files for the project ● Your Makefile ● Your test files 12 You must prepare a compressed tar archive (.tar.gz file) of all of your files to submit to the autograder. One way to do this is to have all of your files for submission (and nothing else) in one directory. Go into this directory and run this command: % dos2unix *; tar cvzf submit.tar.gz *.cpp *.h Makefile test*.txt to prepare a suitable file in your working directory. Submit your project files directly to either of the two autograders at: https://g281-1.eecs.umich.edu or https://g281-2.eecs.umich.edu. Note that when the autograders are turned on and accepting submissions, there will be an announcement on Piazza. The auto graders are identical and your daily submission limit will be shared (and kept track of) between them. You may submit up to two times per calendar day with autograder feedback. For this purpose, days begin and end at midnight (Ann Arbor local time). We will count only your best submission for your grade. If you would instead like us to use your LAST submission, see the autograder FAQ page, or use this form. We strongly recommend that you use some form of revision control (ie: SVN, GIT, etc) and that you “commit” your files every time you upload to the autograder so that you can always retrieve an older version of the code as needed. Please refer to your discussion slides and CTools regarding the use of version control. Please make sure that you read all messages shown at the top section of your autograder results! These messages often help explain some of the issues you are having (such as losing points for having a bad Makefile or why you are segfaulting). Also be sure to note if the autograder shows that one of your own test files exposes a bug in your solution (at the bottom). Libraries and Restrictions The use of the C/C++ standard libraries is highly encouraged for this project, especially functions in the header and container data structures. The smart pointer facilities, and thread/atomics libraries are prohibited. As always, the use of libraries not included in the C/C++ standard libraries is forbidden. Grading ● 80 points — Your grade will be derived from correctness and performance (runtime). Details will be determined by the autograder. ● 10 points — Test file coverage (effectiveness at exposing buggy solutions). ● 10 points — No memory leaks. Make sure to run your code under valgrind before each submit. (This is also a good idea because it will let you know if you have undefined behavior, which will cause your code to crash on the autograder.) When you start submitting test files to the autograder, it will tell you (in the section called “Scoring student test files”) how many bugs exist, the number needed to start earning points, and the number needed for full points. It will also tell you how many are needed to start earning an extra submit/day! 13 Checkpoint This project has a few test cases named CP* that represent a checkpoint. The checkpoint involves implementing base functionality for several commands, and you should have it completed within 10 days of receiving the project specification (less in the Spring; about halfway between receiving this document and the final due date). The checkpoint test cases will only be testing a subset of the functionality of your code. The functionality we will be testing are listed as follows: ● # (comment) Used in all 3 checkpoints ● CREATE Used in all 3 checkpoints ● INSERT INTO Used in checkpoints 2 and 3 ● PRINT FROM … ALL Used in checkpoints 2 and 3 ● REMOVE Used in all 3 checkpoints ● QUIT Used in all 3 checkpoints For example, checkpoint 1 has a comment, a few create and remove commands, and quits. None of the commands produce errors. Test Cases All test cases on the autograder test the QUIT command, because all valid input files must end with it. Many include comment(s) so that we know what the test case is doing. Most of the test case names on the autograder are fairly self-explanatory, but here’s a guide: Appendix: The test case from Appendix A (see below) CP*: Checkpoint cases (see above) CREATE*: Primarily tests the CREATE command, but also PRINT and REMOVE INSERT*: Primarily tests INSERT, but also needs CREATE, PRINT, and REMOVE DELETE*: Primarily DELETE; also needs CREATE, INSERT; sometimes REMOVE and GENERATE GENERATE*: Primarily GENERATE, needs CREATE, INSERT, PRINT; sometimes REMOVE, DELETE JOIN*: Primarily JOIN, but also needs CREATE, and INSERT; some test REMOVE and GENERATE PRINT*: Primarily tests PRINT, but also needs CREATE, INSERT, and DELETE REMOVE*: Primarily tests REMOVE, but also needs CREATE, INSERT, and sometimes GENERATE Short*: Tests all commands, “short” only with respect to Medium* and Long*; always run in quiet mode Medium*: Tests all commands; always run in quiet mode Long*: Tests all commands; always run in quiet mode CP*: Tests the commands listed in the checkpoint section above. 14 Appendix A: Example from the Spec Given the following as input: #Awesome Spec Example! CREATE 281class 3 string string bool emotion person Y/N INSERT INTO 281class 8 ROWS happy Darden true stressed students false busy office_hours true stressed students true stressed Paoletti true happy Darden true happy Sith true victorious Sith true DELETE FROM 281class WHERE person = Darden GENERATE FOR 281class hash INDEX ON emotion PRINT FROM 281class 2 person emotion WHERE Y/N = true CREATE pets 3 string bool bool Name likes_cats? likes_dogs? INSERT INTO pets 2 ROWS Sith true true Paoletti true false JOIN pets AND 281class WHERE Name = person AND PRINT 3 Name 1 emotion 2 likes_dogs? 1 REMOVE pets REMOVE 281class QUIT The output should be: (prompt characters and commands not included) New table 281class with column(s) emotion person Y/N created Added 8 rows to 281class from position 0 to 7 Deleted 2 rows from 281class Created hash index for table 281class on column emotion person emotion office_hours busy students stressed Paoletti stressed Sith happy Sith victorious Printed 5 matching rows from 281class New table pets with column(s) Name likes_cats? likes_dogs? created Added 2 rows to pets from position 0 to 1 Name emotion likes_dogs? Sith happy true Sith victorious true Paoletti stressed false Printed 3 rows from joining pets to 281class Table pets deleted Table 281class deleted Thanks for being silly! 15 Appendix B: Variant Types and You One of the most difficult parts of implementing an efficient database is handling the problem of heterogenous rows, or the fact that rows in the tables contain multiple types. If you wanted to store your rows as vector, what would T be? int? string? bool? The answer, in reality, is that you must be able to store any of the valid types in that row. Unfortunately the current C++ standard does not give us a good way of implementing these types of heterogeneous containers when the types of each column aren’t known at compile time (check out std::tuple<> for when you do!). To remedy this, the course staff has created a special class for you to store in your tables, aptly named TableEntry. This TableEntry is an implementation of what is known as a variant type, as it can be a variety of different types. While we could modify the TableEntry class to contain arbitrary types, for simplicity we have limited the types that can be represented to the set of those that are valid to appear in this project. Most of how to use the TableEntry should be intuitive from the comments in the header file. In addition to that, see the below example for further instructions. The one major downside to our implementation of TableEntry is that it doesn’t play nicely when you try to compare two TableEntry objects that contain different types, or try to compare a TableEntry with a type other than the one it represents internally. To combat that we’ve placed assertions in key locations such that if you do not compile with -DNDEBUG (by using make debug with the provided Makefile), these assertions will fire rather than let you have undefined behavior. If you’re getting weird results with this type, try running under debug conditions and see if an assertion fires. We attached a comment to the assertions so that you can see what the issue is. In order to see where the issue stems from, you must use a debugger and read the stack trace to find the line of your code that calls the offending method. We suggest against delving into the implementation of this class; it’s pretty complex. An example: #include “TableEntry.h” #include #include
EECS 281 –Programming Project 4 Among Us
Overview You are playing the popular multiplayer murder mystery game Among Us , in which there are one or more 1 impostors among a group of crewmates on a spaceship. The impostors’ goal is to sabotage the spaceship by killing all their crewmates before the crew finishes their tasks; the crewmates win by either completing their tasks or ejecting the impostors out of the spaceship before they can kill everyone. Impostors have access to an underground vent network that allows them to quickly move across the spaceship without detection. If a crewmate dies, they become a “ghost”; ghosts can still complete their tasks, and they can travel without restriction throughout the map. In this project, you will be playing the roles of both an Impostor (Part A) and a ghost Crewmate (Parts B and C) to identify efficient routes through the spaceship that will help you win the game. In Impostor mode (Part A), you’ll be setting up the vent network to travel quickly between rooms. You need to make sure that you can access any room from any other room. However, excavating is expensive, so you also want to minimize the total distance of the network. In Ghost mode (Parts B and C), you can move freely across the map, and you want to finish your tasks as quickly as possible to win. You can fly through walls, and you want to find the optimal path to finish all your tasks and end at the first task you did. You may assume that there is one task per room to be completed. To be clear: these scenarios are separate; the program will create a plan for one or the other, but not both in the same run (though you may find that algorithms from one mode help with another mode). In this project, your output does not need to be exactly the same as ours to be correct. As long as it provides a valid answer, and follows the rules for formatting output, it can be in a different order and still be correct. Project Goals ● Understand and implement MST algorithms. Be able to determine whether Prim’s or Kruskal’s is more efficient for a particular scenario. ● Understand and implement a Branch and Bound algorithm. Develop a fast and effective bounding algorithm. ● Explore various heuristic approaches to achieving a nearly-optimal solution as fast as possible. ○ Do some web searching for “TSP heuristics”, don’t choose one worse than O(n 2 ). ● Use a visualization tool to help with debugging. 1 Note that you do NOT have to have experience playing Among Us to understand/implement this project. Version 11-10-20 Written by many crewmates, and maybe an impostor or two © 2020 Regents of the University of Michigan 2 Map Input On startup, your program, amongus, reads input from standard input (cin) describing the locations of the rooms. The map is a grid with the decontamination zone on the bottom left quadrant (see ‘Path Rules’). You will be given a list of M rooms with associated integer coordinates (x, y). Rooms are identified by integer indices which correspond to the order in which they are read in (the first room location corresponds to location 0, the second room location to location 1, etc.). For parts B and C you always start in the 0th room. Formally, the input will be formatted in this manner: The first line will contain a single number denoting the number of rooms on the map. That will be followed by a list of integer x/y, coordinates in the form: x y. You may assume that the input will always be well-formed (it will always conform to this format and you do not need to error check). There may be additional blank lines at the end of the file (but nowhere else). Sample: 5 6 1 2 3 -5 -4 -1 6 0 -1 The above sample can be visualized as in the figure below, where the numbers shown are the room indices that starts at the 0-th room, and the blue line indicates the decontamination border (for more details, see the ‘Path Rules’ section) which separates the outer area from the lab: room 2 is in the lab, 4 is in decontamination, and the rest are in the outer area. There is more than one way to represent this configuration internally in your program, and this will affect your runtime. Choose your data structures wisely! Version 11-10-20 3 For Part A, there will always be at least two points. For Parts B and C, there will always be at least 3. Path Rules Distance We represent the paths you take as a set of pairs of points or as a sequence of points. Your calculations 2 should use Euclidean distance, and you should represent your distances as doubles. To connect rooms in Part A, you must only move between rooms in the same area. In order to reach rooms in the lab from the outer area (or vice-versa), you must first pass through a decontamination room. In Parts B and C, you are now a ghost (a minor inconvenience), and can fly directly from room to room. You move along the line segments that connect rooms. Please be very careful with rounding errors in your distance calculations; the autograder will allow you a 0.01 margin of error to account for rounding, but you should avoid rounding at intermediate steps. The Lab There is a lab that is located in the bottom left quadrant of the map. When first embarking on your journey (in Part A), you can travel using vents between different locations in the outer area. The lab is only accessible through decontamination, which consists of the negative portions of the x and y axes, including the origin (0, 0). In Part A, you may only enter and exit the lab by passing through a location in decontamination. For example, you are not allowed to travel directly from (-5, -4) to (6, 1). You must first travel from (-5, -4) to (0, -1), and then from (0, -1) to (6, 1). For the sake of simplicity, assume two outer area locations that “cross” the lab can be connected by a direct path. The example below shows two validly connected outer area rooms, at locations A and B. In this example, there is a direct path from A to B, by using vents. When you are a ghost (in Parts B and C), you can fly directly from locations in the lab to locations in the outer area, and vice versa; in other words, you may ignore decontamination. 2 “Path,” in this project, can be understood as a route between two locations. Version 11-10-20 4 Other Important: For parts B and C, you always start at the 0-th room location. You are also not allowed to ‘wrap around the edges of the world’ (you cannot go above the top of the map to arrive at the bottom). Command Line Input Your program, amongus, should take the following case-sensitive command line options: ● -m, –mode This command line option must be specified, if it is not, print a useful error message to standard error (cerr) and exit(1). MODE is a required argument. Set the program mode to MODE. MODE must be one of MST, FASTTSP, or OPTTSP. The MODE corresponds to the algorithm amongus runs (and therefore what it outputs). ● -h, –help Print a short description of this program and its arguments and exit(0). Valid examples of how to execute the program: ./amongus –mode MST (OK, but you must type the input by hand) ./amongus -h < inputFile.txt (OK, -h happens before we realize there’s no -m) ./amongus -m OPTTSP < inputFile.txt (OK, reads from a file on disk) ./amongus -m BLAH (BAD mode following -m) Remember that when we redirect input, it does not affect the command line. Redirecting input just sends the file contents to cin. You should not have to modify your code to allow this to work; the operating system will handle it. We will not be specifically error-checking your command-line handling, however we expect that your program conforms with the default behavior of getopt_long(). Incorrect command-line handling may lead to a variety of difficult-to-diagnose problems. Algorithms Your program should run one and only one of the following modes at runtime depending on the mode option for that particular program call. We divide it into ‘parts’ for your convenience, though you may find that elements and algorithms of some parts can help with others. Part A – MST mode Description This mode will devise a vent system that connects every room location while minimizing the total distance of vents needed. Version 11-10-20 5 When the program is run in the MST mode, it should calculate and print out an MST connecting all of the room locations. You may use any MST algorithm to connect all the locations. Hint: Unless you want to implement both and compare, think about the nature of the graph (how many vertices and edges does it have?). You are free to adapt code from the lecture slides to fit this project, but you will want to carefully think about the data structures necessary to do each part (storing unnecessary data can go over memory limits). Your program must always generate one valid MST for each input. Remember that in this part, you must respect the boundary between the lab and the outer area. . Your MST cannot connect a room in the lab to one in the outer area, without passing through a room in decontamination. Output Format (for Part A only) For the MST mode, you should print the total weight of the MST you generate by itself on a line; this weight is the sum of the weights of all edges in your MST (in terms of Euclidean distance). You should then print all edges in the MST. All output should be printed to standard output (cout). The output should be of the format: weight node node node node …… The nodes are the room location numbers corresponding to the vertices of the MST and a pair of nodes on a given line of the output describes an edge in the MST from the first node to the second. To be clear, the weight should be formatted as a double (2 decimal point precision is enough – see Appendix A), and the node numbers should all be integer values when printed. For example, given the example input file above, your MST mode output might be: 19.02 0 1 2 4 1 3 1 4 You should also always print the pairs of vertices that describe an edge such that the index on the left has a smaller integer value than the index on the right. In other words: 1 2 is a possible valid edge of output, but 2 1 is not. If it is not possible to construct an MST for the rooms (i.e. if there are rooms in both the outer area and in the lab, with no decontamination rooms), your program should print the message “Cannot construct MST” to cerr and exit(1). Version 11-10-20 6 Parts B & C (FASTTSP & OPTTSP mode) Description (for both Parts B and C) In this mode, you will figure out how to travel to every room and then return to your starting location. The route will always start at room index 0, visit every other room exactly once, and return to the starting point. Your job will thus be to solve the TSP (Traveling Salesperson Problem) and choose paths to room locations so as to minimize the total distance travelled. Euclidean (straight-line) distance is used here again to compute distances between rooms. Because you are now a ghost, you no longer have to worry about decontaminating; you can go from any room to any other. For FASTTSP mode, you do not need to produce an optimal TSP tour, but your solution should be close to optimal. Because your FASTTSP algorithm does not need to be perfectly optimal, we expect it to run much faster than finding a perfect optimal solution. Do some online searching for “TSP heuristics”. There are several types, some easier to implement, some with better path lengths, some both. For OPTTSP mode, you must find an optimal solution to the TSP (the actual minimum Euclidean distance necessary). More on the differences between OPTTSP and FASTTSP modes will be discussed later. For both methods: You must start the tour from the 0-th room location (your starting location). You must visit every room exactly once (there’s no point in returning to a place already checked), and at the end of the tour, you must return back to the 0-th location. The length of a tour is defined as the sum of all pairwise distances travelled – that is, the sum of the lengths of all paths taken (using Euclidean distance). Your program must print the indices of the locations in an order such that the length of this tour is as small as possible. More details about how to accomplish this are listed below. Output Format (for both Parts B and C) You should begin your output by printing the overall length of your tour on a line. On the next line, output the nodes in the order in which you visit them. The initial node should be the starting location index and the last node should be the location number directly before returning back to the 0-th location. The nodes in your tour should be printed such that they are separated by a single space. There can be a space after the last location listed. All output should be printed to standard output (cout). For example, if given the input file above, your program could produce: 31.64 0 4 2 3 1 or Version 11-10-20 7 31.64 0 1 3 2 4 Part B Only Description In the case where we have a large number of possible rooms to visit, finding an optimal method for visiting all of them may be too slow, and you may grow old and die before completing your task. You can use heuristics instead to find nearly-optimal tours. A heuristic is a problem-solving method (an algorithm) that can produce a good answer that is not necessarily the best answer. For example, you can skip a branch speculatively rather than waiting to know for a fact that it can be skipped. There are many other simple heuristic techniques, such as starting with a random tour and trying to improve it by small changes. You should be able to produce a solution to the Traveling Salesperson Problem (TSP) that is as close as possible to the optimal tour length (though it does not need to be optimal). In the best case, your produced tour length will equal the optimal tour length. You are allowed to use any combination of algorithms for this section that we have covered in class, including the MST algorithm you wrote for Part A. You will need to be creative when designing your algorithms for this section. You are free to implement any other algorithm you choose, so long as it meets the time and memory constraints. However, you should not use any advanced algorithms or formulas (such as Simulated Annealing, Genetic Algorithms and Tabu search – they are too slow) that are significantly different from what has been covered in class. Instead, creatively combine the algorithms that you already know and come up with concise optimizations. Your heuristic will very likely be greedy in some way, but there are different ways to be greedy! Part C Only Description To find an optimal tour, you could start with the brute force method of exhaustive enumeration that evaluates every tour and picks a smallest tour. By structuring this enumeration in a clever way, you could determine that some branches of the search cannot lead to optimal solutions. For example, you could compute lower bounds on the length of any full tour that can be found in a given branch. If such a lower bound exceeds the cost of a full solution you have found previously, you can skip this branch as hopeless. If implemented correctly, such a branch-and-bound method should always produce an optimal solution. It will not scale as well as sorting or searching algorithms do for other problems, but it should be usable with a small number of locations. Clever optimizations (identifying hopeless branches of search early) can make your algorithm a hundred times faster. Drawing TSP tours on paper and solving small location configurations to optimality by hand should be very useful. Remember that there is a tradeoff between the time it takes to run your bounding function and the number of branches that bound lets you prune. MAKE SURE that you use the version of genPerms() presented in either the “Project 4 Tutorial” or the “Backtracking BB TSP” lecture slides. The one presented earlier in the semester, in the “Stacks and Queues” lecture slides is much slower. Given an input set of N locations defined by integer coordinates, your job is to produce an optimal tour using branch-and-bound algorithms. Your program should always produce the shortest possible tour as a solution, even if computing that solution is time-consuming. You will be given a 35-second cpu time limit to generate Version 11-10-20 8 your solution. If your program does not produce a valid solution, it will fail the test case. Your solution will also be judged by time and space budgets as per previous projects. Libraries and Restrictions We highly encourage the use of the STL for this project, with the exception of several prohibited features. Do not use: ● The C++11 regular expressions library ● The STL thread/atomics libraries (which spoil runtime measurements) ● Shared or unique pointers. ● Other libraries (e.g., boost, pthreads, etc). Testing and Debugging Part of this project is to prepare several test files that will expose defects in buggy solutions – your own or someone else’s. As this should be useful for testing and debugging your programs, we recommend that you first try to catch a few of our intentionally-buggy solutions with your test files, before completing your solution. The autograder will also tell you if one of your own test files exposes bugs in your solution. Each test that you submit should consist of an input file. When we run your test files on one of intentionally-buggy project solutions, we compare the output to that of a correct project solution. If the outputs differ, the test file is said to expose that bug. Test files should be named test-n-MODE.txt where 1 <= n <= 10. The autograder’s buggy solutions will run your test files in the specified MODE. The mode must be MST, FASTTSP, or OPTTSP. Your test files may have no more than 10 coordinates in any one file. You may submit up to 10 test files (though it is possible to get full credit with fewer test files). The tests the autograder runs on your solution are NOT limited to 10 coordinates in a file; your solution should not impose any size limits (as long as sufficient system memory is available). Submitting to the Autograder Do all of your work (with all needed source code files, as well as test files) in some directory other than your home directory. This will be your “submit directory”. Before you turn in your code, be sure that: ● Every source code and header file contains the following project identifier in a comment at the top of the file: ● // Project Identifier: 9B734EC0C043C5A836EA0EBE4BEFEA164490B2C7 ● The Makefile must also have this identifier (in the first TODO block) ● You have deleted all .o files and your executable(s). Typing ‘make clean’ shall accomplish this. ● Your makefile is called Makefile. Typing ‘make -R -r’ builds your code without errors and generates an executable file called amongus. The command-line options -R and -r disable automatic build rules, which will not work on the autograder. Version 11-10-20 9 ● Your Makefile specifies that you are compiling with the gcc optimization option -O3. This is extremely important for getting all of the performance points, as -O3 can speed up code by an order of magnitude. ● Your test files are named test-n-MODE.txt and no other project file names begin with test. Up to 10 test files may be submitted. The “mode” portion of the filename must be MST, OPTTSP or FASTTSP. ● The total size of your program and test files does not exceed 2MB. ● You don’t have any unnecessary files (including temporary files created by your text editor and compiler, etc) or subdirectories in your submit directory (i.e. the .git folder used by git source code management). ● Your code compiles and runs correctly using version 6.2.0 of the g++ compiler. This is available on the CAEN Linux systems (that you can access via login.engin.umich.edu). Even if everything seems to work on another operating system or with different versions of GCC, the course staff will not support anything other than GCC 6.2.0 running on Linux (students using other compilers and OS did observe incompatibilities). To compile with g++ version 6.2.0 on CAEN you must put the following at the top of your Makefile: PATH := /usr/um/gcc-6.2.0/bin:$(PATH) LD_LIBRARY_PATH := /usr/um/gcc-6.2.0/lib64 LD_RUN_PATH := /usr/um/gcc-6.2.0/lib64 Turn in all of the following files: ● All your .h and/or .cpp files for the project ● Your Makefile ● Your test files You must prepare a compressed tar archive (.tar.gz file) of all of your files to submit to the autograder. One way to do this is to have all of your files for submission (and nothing else) in one directory. In this directory, run dos2unix *; tar czvf ./submit.tar.gz *.cpp *.h* Makefile test-*.txt This will prepare a suitable file in your working directory. If you’re using our Makefile, use the command make fullsubmit. Submit your project files directly to either of the two autograders at: https://g281-1.eecs.umich.edu/ or https://g281-2.eecs.umich.edu/. When the autograders are turned on and accepting submissions, there will be an announcement on Piazza. The autograders are identical and your daily submission limit will be shared (and kept track of) between them. You may submit up to four times per calendar day (more during the Spring). For this purpose, days begin and end at midnight (Ann Arbor local time). We will count only your best submission for your grade. If you would instead like us to use your LAST submission, see the autograder FAQ page, or use this form. Please make sure that you read all messages shown at the top section of your autograder results! These messages often help explain some of the issues you are having (such as losing points for having a bad Makefile or why you are segfaulting). Also be sure to check if the autograder shows that one of your own test files exposes a bug in your solution (at the bottom). Search the autograder results for the word “Hint” (without quote marks) for potentially useful information. Version 11-10-20 10 Grading 80 points — Your grade will be derived from correctness and performance (runtime). This will be determined by the autograder. On this project we expect a much broader spread of runtimes than on previous projects. As with all projects, the test cases used for the final grading are likely to be different. 10 points — Your program does not leak memory. Make sure to run your code under valgrind before each submit. This is also a good idea because it will let you know if you have undefined behavior (such as reading an uninitialized variable), which may cause your code to crash on the autograder. 10 points — Student test file coverage (effectiveness at exposing buggy solutions). We also reserve the right to deduct up to 5 points for bad programming style, code that is unnecessarily duplicated, etc. Runtime Quality Tradeoffs In this project there is no single correct answer (unlike previous projects). Accordingly, the grading of your problem will not be as simple as a ‘diff’, but will instead be a result of evaluating your output. For example, if we gave you a square for Part A, you might choose any 3 of the 4 edges, and print them in any order, meaning there are 24 possible correct output files! Particularly for Part B, we expect to see greater variation in student output. Part B asks you to solve a hard problem, and with the given time constraints, we don’t actually expect your output to be optimal for all cases. The quality of your solutions may even vary from case to case. We want you to quickly produce solutions that are close to optimal. This inevitably creates tradeoffs between solution optimality and runtime. You may find it useful to implement more than one algorithm or heuristic that you use in Part B, and do some testing plus use the autograder to determine which works the best.. Your grade for Part B will be determined based on how close you are to the best solution, computed as a percentage. Hints and Advice There’s a project tutorial video available: https://youtu.be/hdTrbU28pnk It will be difficult to get this project correct without visualizing your MSTs and TSP tours. We have provided a visualization tool on the Autograder that you can use; follow this link: https://g281-2.eecs.umich.edu/p4viz/. Running your code (on CAEN or locally) using valgrind can help you find and remove undefined (buggy) behavior and memory leaks from your code. This can save you from losing points in the final run when you mistakenly believe your code to be correct. Version 11-10-20 11 It is extremely helpful to compile your code with the following gcc options: -Wall -Wextra -Wvla -Wconversion -pedantic. This way the compiler can warn you about poor style and parts of your code that may result in unintended/undefined behavior. Make sure that you are using getopt_long() for handling command-line options. Appendix A In order to ensure that your output is within the tolerable margins of error for the autograder to correctly grade your program you must run the following lines of code before you output anything to cout. We highly recommend that you simply put them at the top of main() so that you don’t forget about them. cout << std::setprecision(2); //Always show 2 decimal places cout << std::fixed; //Disable scientific notation for large numbers You will need to #include to use this code. If you use a stringstream for output, you should send these lines to your stringstream instead of cout. Don’t use a stringstream for output. Appendix B One possible heuristic (NOT the only one, NOT an easy one) is to construct a starting point for your TSP tour by following MST edges and skipping previously visited vertices. By using this technique correctly, you can find a tour that is guaranteed to be no more than twice the optimal solution’s length (and use this “2x” check for debugging). You can then use this starting point to make adjustments and do better. This is not necessarily the best approach! It is one approach used to illustrate how a heuristic can be used, there are better ones out there. This example method is VERY hard to code. Do some research on other heuristics! “Corner Cutting” algorithm illustrated Suppose locations are distributed in an input map as shown below: Below is an MST that would be formed from the above locations. Version 11-10-20 12 Below is a path taken by strictly following the edges of the MST. However, by “cutting corners,” as described in the picture below, an effective TSP solution can be generated. This is possible because once a vertex is visited, it will not be visited again. In the above path that strictly follows the edges of the MST, the middle vertex is visited four times (visited after an outer vertex is visited). If the middle vertex is skipped after the first visit, a TSP tour shown below is achieved. The reason we bring up this ‘twice-around-the-tree’ heuristic is to state the theorem that the resulting tours are no worse than 2x the MST cost. The following is an explanation/proof sketch as to why the 2x bound is valid: If you perform a DFS traversal of a tree and return to the initial node, you have visited every edge exactly twice (“going in” and “returning”), so this gives you exactly double the cost/length of the MST (the fact that the tree is minimal is not important for the logic of the proof – this works for any tree). Since the tree is spanning, you have Version 11-10-20 13 visited all locations. The only problem with this twice-around-the-tree self-intersecting path is that it’s not a tour. It can be turned into a tour by taking shortcuts (i.e., taking shortcuts is important not only to reduce the length). When considering other heuristics: 1. The theorem allows you to upper-bound the cost of optimal TSP tours based on MST length. 2. The theorem has a constructive proof – a heuristic that always produces TSP tours that satisfy this upper bound . 3 As a consequence, your heuristic should also satisfy the 2x upper bound; if not, you can just implement the twice-around-the-tree heuristic. However, we do not require this because there are much better heuristics – ones that are faster and produce better results. Appendix C Legend for test case names and autograder info Autograder test names are broken down by the mode used with each test: ● Name of test file starting with “A”: these test files are used with the MST mode. ● Name of test file starting with “B”: these test files are used with the FASTTSP mode. ● Name of test file starting with “C”: these test files are used with the OPTTSP mode. Test files that start with “C” are much smaller in size than those that start with “A” and “B”. The number of rooms for such files is smaller than 40 as compared to tens of thousands of rooms for the others. The reason is that TSP is NP-complete and finding an optimal solution in a reasonable amount of time is only possible for small-sized problems. When you start submitting test files to the autograder, it will tell you (in the section called “Scoring student test files”) how many bugs exist, the number needed to start earning points, and the number needed for full points. It will also tell you how many are needed to start earning an extra submit/day! 3 Such heuristics are also called approximation algorithms. Version 11-10-20