Description
Spring 2025
2. String hash function
In this problem you will write your own hash function for std::string in the provided hash.h file. While std::strings can be of arbitrary length and contain any legal ASCII character, we will assume the largest string you will ever receive is 28 letters long (e.g. antidisestablishmentarianism) and only contains letters and digits (‘0’-‘9’). For the letters, you should treat upper-case letters the same as you would lower-case letters. The Approach The basic approach will be to treat groups of letters/digits as base-36 and convert to an integer value (i.e. the decimal equivalent). First translate each letter into a value between 0 and 35, where a=0, b=1, c=2, …, z=25 and digit ‘0’=26, … ‘9’=35. Be sure to convert an upper case letter to lower case before performing this mapping. Next you will translate a (sub)string of 6 letters a1 a2 a3 a4 a5 a6 into an (unsigned long long) 64-bit integer w[i] (essentially converting from base-36 to decimal), via the following mathematical formula: w[i]=365∗a1+364∗a2+363∗a3+362∗a4+36∗a5+a6 Place zeros in the leading positions if your (sub)string is shorter than 6 letters. So an example string like “104” would set a1, a2, and a3 all to zero (since we only have a length 3 string) and yield 362∗27+36∗26+30 (since ‘1’ : 27, ‘0’ : 26, and ‘4’ : 30) You should use the base conversion approach taught in class to avoid repeated calls to pow(). Note that indexing is a bit tricky here so think carefully. The digit at the “end” of the string is assigned the power 36^0, and the 2nd to last letter is worth 36^1, etc. If an input word is longer than 6 letters, then you should first do the above process for the last 6 letters in the word, then repeat the process for each previous group of 6 letters. Recall, you will never receive a word longer than 28 characters. The last group may not have 6 letters in which case you would treat it as a substring of less than 6 characters as described above. Since we have at most 28 characters this process should result in a sequence of no more than 5 integers: w[0] w[1] w[2] w[3] w[4], where w[4] was produced by the last 6 letters of the word. Store these values in an array (of unsigned long long). Place zeros in the leading positions of w[i] if the string does not contain enough characters to make use of those values. So for a string of 12 letters, w[0], w[1], and w[2] would all be 0 and only w[3] and w[4] would be (potentially) non-zero. We will now hash the string. Use the following formula to produce and return the final hash result using the formula below (where the r values are explained below). h(k)=(r[0]∗w[0]+r[1]∗w[1]+r[2]∗w[2]+r[3]∗w[3]+r[4]∗w[4]) where the r[i] values are either: ● 5 random numbers created when you instantiate the hash function using the current time as your seed (i.e. srand(time(0))). Be sure to #include and #include . ● 5 preset values for debugging (so we all get the same hash results): r[0]=983132572, r[1]=1468777056, r[2]=552714139, r[3]=984953261, r[4]=261934300 (these numbers were generated by random.org) Note: We will allow h(k) to produce a large integer (beyond the range of m, the table size) and only mod it by the table size in the hash table. The constructor of your hash object will take a debug parameter, which, if true should set the r values to the above debugging values, and if false should select the randomized values. Note: Make sure you compute using unsigned long long variables or cast (constants or other variables) to that type at the appropriate places so that you don’t have an overflow error. Testing We have provided a simple test program, str-hash-test.cpp where you can provide a string on the command line and it will hash the string and output the hash result. Below is some debug output of the w[i] values computed for several test strings using the debug r values. Below is the output for a few strings: ./str-hash-test abc yields w[0] = 0 w[1] = 0 w[2] = 0 w[3] = 0 w[4] = 38 h(abc)=9953503400 ./str-hash-test abc123 yields w[0] = 0 w[1] = 0 w[2] = 0 w[3] = 0 w[4] = 1808957 h(abc123)=473827885525100 ./str-hash-test B yields w[0] = 0 w[1] = 0 w[2] = 0 w[3] = 0 w[4] = 1 h(B)=261934300 ./str-hash-test gfedcba yields w[0] = 0 w[1] = 0 w[2] = 0 w[3] = 6 w[4] = 309191940 h(gfedcba)=80987980279261566 ./str-hash-test antidisestablishmentarianism yields w[0] = 17540 w[1] = 195681115 w[2] = 2203855 w[3] = 732943745 w[4] = 484346964 h(antidisestablishmentarianism)=1137429692708383810 3. Hash table with linear and double hasing You will create a HashTable data structure that uses open-addressing (i.e. uses probing for collision resolution) and NOT chaining. Your table should support linear and double hasing probing via functors. It should support resizing when the loading factor is above a user-defined threshold. You may use the std::hash<> functor provided with the C++ std library as well as your own hash function for strings made of letters and digits only (no punctuation) that you created in an earlier problem. You should complete the implementation in ht.h. template< typename K, typename V, typename Prober = LinearProber, typename Hash = std::hash, typename KEqual = std::equal_to > class HashTable { /* Implementation */ }; The template types are as follows: ● K: the key type (just like a normal map) ● V: the value type (just like a normal map) ● Prober: a functor that supports an init() and next() function that the hash table can use to generate the probing sequence (e.g. linear, quadratic, double-hashing). A base Prober class has been written and a derived LinearProber is nearly complete. You will then write your own DoubleHashProber class to implement double-hash probing. The actual Prober type that is passed may contain other template arguments but must at least take the Key type as a template argument. The DoubleHashProber will take the key type AND the second hash function type (see below for more details). ● Hash: The primary hash function that converts a value of type K to a std::size_t which we have typedef’d as HASH_INDEX_T. ● KEqual: A functor that takes two K type objects and should return true if the two objects are equal and false, otherwise. Internally, we will use a hash table of pointers to HashItems (i.e. std::vector<HashItem*> > ). The HashItem has the following members: typedef std::pair<KeyType, ValueType> ItemType; struct HashItem { ItemType item; // key, value pair bool deleted; }; You should allocate a HashItem when inserting and free them at an appropriate time based on the description below. If no location is available to insert the item, you should throw std::logic_error, though since we are not using Quadratic probing and we resize the table when the loading factor is above a certain threshold, this should not happen. (Yet we include it in case we do implement quadratic probing in the future.) We have also provided a constant array of prime sizes that can be used, in turn, when resizing and rehashing is necessary. This sequence is: 11, 23, 47, 97, 197, 397, 797, 1597, 3203, 6421, 12853, 25717, 51437, 102877, 205759, 411527, 823117, 1646237, 3292489, 6584983, 13169977, 26339969, 52679969, 105359969, 210719881, 421439783, 842879579, 1685759167. Thus, when a HashTable is constructed it should start with a size of 11. The client will supply a loading factor, alpha, at construction, and before inserting any new element, you should resize (to the next larger size in the list above) if the current loading factor is at or above the given alpha. For removal of keys, we will use a deferred strategy of simply marking an item as “deleted” using a bool value. These values should not be considered when calls are made to find or operator[] but should continue to count toward the loading factor until the next resize/rehash. At that point, when you resize, you will only rehash the non-deleted items and (permanently) remove the deleted items. We will not ask you to implement an iterator for the hash table. So find() will simply return a pointer to the key,value pair if it exists and nullptr if it does not. To facilitate tracking relevant statistics we will use for performance measurements, we have provided the core probing routine: probe(key). This routine, applies the hash function to get a starting index, then initializes the prober and repeatedly calls next() until it finds the desired key, reaches a hashtable location that contains nullptr (where a new item may be inserted if desired), or reaches its upper limit of calls (i.e. cannot find a null location). probe() utilizes the Prober. Prober::init is called to give the prober all relevant info that it may need, regardless of the probing strategy (i.e. starting index, table size, the key (which would be needed by double-hash probing) etc.). Then a sequence of calls to Prober::next() will ensue. If, for example we are using linear probing, the first call to next() would yield h(k), the subsequent call to next() would yield h(k)+1, the third call would yield h(k)+2, etc. Notice: the first call to next() just returns the h(k) value passed to Prober::init(). As probing progresses we will update statistics such as the total number of probes. Some accessor and mutator functions are provided to access those statistics. Note: When probing to insert a new key, we could stop on a location marked as deleted and simply overwrite it with the key to be inserted. However, because our design uses the same probe(key) function for all three operations: insert, find, and remove, and certainly for find and remove we would NOT want to stop on a deleted item, but instead keep probing to find the desired key, we will simply take the more inefficient probing approach of not reusing locations marked for deletion when probing for insertion. A debug function, reportAll is also provided to print out each key value pair. Use this when necessary to help you debug your code. Probers We have abstracted the probing sequence so various strategies may be implemented without the hash table implementation being modified. Complete the LinearProber and then write a similar DoubleHashProber that uses double-hashing. Return npos after m calls to next() . Double Hash Prober Your double-hash prober should take in two template arguments: template struct DoubleHashProber : public Prober { /* Code */ }; The KeyType is the same as that of the hash table, while Hash2 should be a function object that implements a HASH_INDEX_T operator()(const KeyType& key) const. Our double-hash prober should use a stepsize of m2 – h2(k)%m2 where m2 is some prime moduli less than the current hash table size. Since our double hash prober needs this separate m2, we have a static array of moduli to use and provide a helper function to determine which moduli to use, given the hash table size, m passed to init(). The init() is complete but can be appended to, if need be. Testing Your Hash Table A simple test driver program ht-test.cpp has been provided which you may modify to test various aspects of your hash table implementation. We will not grade this file so use it as you please. BUT PLEASE do some sanity tests with it BEFORE using any of our test suite. 4. Boggle This part of the assignment is another problem that requires a recursive back-tracking approach to solve. Boggle is a word game where players roll a randomized square grid of letters and then race to find all of the words that can be spelled left-to-right, down or diagonal (down and to the right). The goal is to identify the most words. In our version of Boggle we are creating a randomized grid of letters (see boggle.cpp) using the Scrabble letter frequencies (truly randomized letters will form fewer words). In our version of Boggle we also want to search for and return only the longest word that starts at a particular position (return EARS not EAR). In order to facillitate this feature we provide a std::set made up of all of the word prefixes found in the dictionary. Using this you can tell that EAR is not only a terminal word, but is also a prefix for other words. We also provide a dictionary based on the Scrabble tournament word list (so no single letter words). See boggle.cpp for the code that creates both the prefix set and dictionary. boggle-driver.cpp is the main program for this problem and you should not need to change it (if you do make changes, make sure not to delete the code that outputs the words, as this is used by the autochecker). You execute boggle-driver with the board size, seed and dictionary file as command line arguments. We also give you boggle() in boggle.cpp. This function iterates over each position in the board, starting a new search at that location in the three required directions. Your task is to implement boggleHelper() in boggle.cpp to recursively search for words in the given board. The r and c arguments are the current position in the board. The dr and dc input argument set the search direction. You will not need to change dr or dc as you search, rather use them to set r and c as you recurse to the next position. Your solution must recursively build up a word along the direction of search and insert only the longest word found into the std::set result set. You’ll note that boggleHelper() is type bool, you will need to use this return value to help you insert only the longest word found. Your solution must be backtracking, i.e if adding a new letter along the search direction does not continue to make a longer word, the recursion must stop, though you may need to add a word to the result set at this point. You can use the std::set prefix input argument to see if a string is a prefix of a longer word or words. If you don’t use a back-tracking approach it is likely that your solution will be too slow to pass the submission tests. You can treat the board like a 2D array of char, accessing it like board[x][y]. See printBoard() in boggle.cpp. See the next page in the guide for some example executions. 5. Boggle Examples ./boggle-driver 3 999 dict.txt I R R U M A E D O Found 5 words: DO, ED, MA, MO, UM ./boggle-driver 4 11 dict.txt C I R N J J W I U A P E G D V N Found 6 words: AD, APE, EN, JUG, PE, WE ./boggle-driver 5 10 dict.txt I H U N D E I D N M E H A U R I R F C A D A I T N Found 13 words: AIT, AN, DA, EH, ER, HA, HI, HUN, ID, IT, NU, RAN, UN ./boggle-driver 7 1234 dict.txt D T E O A I V P E W I G A E S E A W H I I A A I C E E Z O S P W U P U G O N S F S E S O V K L T U Found 30 words: AA, AE, AG, AI, AS, AW, CEE, DE, EW, GAE, GI, GO, HE, HI, ICE, OI, ONS, OS, PE, PEW, PST, SAPS, SEA, SO, TE, TEE, UP, US, WE, WIG ./boggle-driver 10 104 dict.txt I E R T A I Q K A N S J I Z I A D C I E L E A E L P E O V I Z M R H H A W E E N Q L S D E O E N E S Q O E S A S T I S X L N T O C I L Z H L N I T O F R O A E I B R I Z N C V Q F W R O R Q Q E W Y O E Found 72 words: AA, AD, AE, AH, AI, AIL, AN, ARSE, AS, AWEE, BO, DA, DE, DEW, DOES, EF, EH, EL, EM, EN, ER, ES, ET, EW, EWE, EX, FE, FRO, HAW, HE, HES, HI, HOT, ID, INS, IS, IT, KA, KI, LA, LEA, LI, LO, NE, NIT, OE, OES, OF, ON, ONE, OR, OS, PA, PE, RIA, SER, SETT, SHE, SIR, SO, TA, TI, TIP, TIS, TO, TONE, VEES, WE, WEEN, WET, YO, ZA 6. Boggle Testing On this page you can run a checker to help you pass the autograder. The first few Boggle tests are from the example page. The others are “hidden” in that we don’t tell you the output list, just how many words are found. Boggle tests This autochecker will run your Boggle program with the following tests. For this first set, see the example page for the expected result. ./boggle-driver 3 999 dict.txt ./boggle-driver 4 11 dict.txt ./boggle-driver 5 10 dict.txt ./boggle-driver 7 1234 dict.txt ./boggle-driver 10 104 dict.txt The next 5 tests are “hidden”: ./boggle-driver 3 98765 dict.txt (must find 4 words) ./boggle-driver 5 314 dict.txt (must find 14 words) ./boggle-driver 7 8675309 dict.txt (must find 43 words) ./boggle-driver 9 4326872 dict.txt (must find 54 words) ./boggle-driver 11 88 dict.txt (must find 98 words) Check It!