Description
Introduction In this programming assignment, you will practice implementing hash functions and hash tables and using them to solve algorithmic problems. In some cases you will just implement an algorithm from the lectures, while in others you will need to invent an algorithm to solve the given problem using hashing.
Learning Outcomes Upon completing this programming assignment you will be able to: 1. Apply hashing to solve the given algorithmic problems. 2. Implement a simple phone book manager. 3. Implement a hash table using the chaining scheme. 4. Find all occurrences of a pattern in text using Rabin–Karp’s algorithm.
Passing Criteria: 2 out of 3 Passing thisprogramming assignmentrequires passingat least2out of3code problemsfrom thisassignment. In turn, passing a code problem requires implementing a solution that passes all the tests for this problem in the grader and does so under the time and memory limits specified in the problem statement.
Contents 1 Problem: Phone book 3
2 Problem: Hashing with chains 5
3 Problem: Find pattern in text 9
4 General Instructions and Recommendations on Solving Algorithmic Problems 11 4.1 Reading the Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 4.2 Designing an Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 4.3 Implementing Your Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 4.4 Compiling Your Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 4.5 Testing Your Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 4.6 Submitting Your Program to the Grading System . . . . . . . . . . . . . . . . . . . . . . . . . 13 4.7 Debugging and Stress Testing Your Program . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1
5 Frequently Asked Questions 14 5.1 I submit the program, but nothing happens. Why? . . . . . . . . . . . . . . . . . . . . . . . . 14 5.2 I submit the solution only for one problem, but all the problems in the assignment are graded. Why? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 5.3 What are the possible grading outcomes, and how to read them? . . . . . . . . . . . . . . . . 14 5.4 How to understand why my program fails and to fix it? . . . . . . . . . . . . . . . . . . . . . 15 5.5 Why do you hide the test on which my program fails? . . . . . . . . . . . . . . . . . . . . . . 15 5.6 My solution does not pass the tests? May I post it in the forum and ask for a help? . . . . . 16 5.7 My implementation always fails in the grader, though I already tested and stress tested it a lot. Would not it be better if you give me a solution to this problem or at least the test cases that you use? I will then be able to fix my code and will learn how to avoid making mistakes. Otherwise, I do not feel that I learn anything from solving this problem. I am just stuck. . . 16
2
1 Problem: Phone book Problem Introduction In this problem you will implement a simple phone book manager.
Problem Description Task. In this task your goal is to implement a simple phone book manager. It should be able to process the following types of user’s queries: ∙ add number name. It means that the user adds a person with name name and phone number number to the phone book. If there exists a user with such number already, then your manager has to overwrite the corresponding name. ∙ del number. It means that the manager should erase a person with number number from the phone book. If there is no such person, then it should just ignore the query. ∙ find number. It means that the user looks for a person with phone number number. The manager should reply with the appropriate name, or with string “not found” (without quotes) if there is no such person in the book. Input Format. There is a single integer N in the first line — the number of queries. It’s followed by N lines, each of them contains one query in the format described above. Constraints. 1 ≤ N ≤ 105. All phone numbers consist of decimal digits, they don’t have leading zeros, and each of them has no more than 7 digits. All names are non-empty strings of latin letters, and each of them has length at most 15. It’s guaranteed that there is no person with name “not found”. Output Format. Print the result of each find query — the name corresponding to the phone number or “not found” (without quotes) if there is no person in the phone book with such phone number. Output one result per line in the same order as the find queries are given in the input. Time Limits. C: 3 sec, C++: 3 sec, Java: 6 sec, Python: 6 sec. C#: 4.5 sec, Haskell: 6 sec, JavaScript: 9 sec, Ruby: 9 sec, Scala: 9 sec. Memory Limit. 512Mb. Sample 1. Input: 12 add 911 police add 76213 Mom add 17239 Bob find 76213 find 910 find 911 del 910 del 911 find 911 find 76213 add 76213 daddy find 76213 Output:
3
Mom not found police not found Mom daddy Explanation: 76213 is Mom’s number, 910 is not a number in the phone book, 911 is the number of police, but then it was deleted from the phone book, so the second search for 911 returned “not found”. Also, note that when the daddy was added with the same phone number 76213 as Mom’s phone number, the contact’s name was rewritten, and now search for 76213 returns “daddy” instead of “Mom”. Sample 2. Input: 8 find 3839442 add 123456 me add 0 granny find 0 find 123456 del 0 del 0 find 0 Output: not found granny me not found Explanation: Recall that deleting a number that doesn’t exist in the phone book doesn’t change anything.
Starter Files ThestartersolutionsforC++, JavaandPython3inthisproblemreadtheinput, implementanaivealgorithm tolookupnamesbyphonenumbersandwritetheoutput. Youneedtouseafastdatastructuretoimplement a better algorithm. If you use other languages, you need to implement the solution from scratch.
What to Do Use the direct addressing scheme.
Need Help? Ask a question or see the questions asked by other learners at this forum thread.
4
2 Problem: Hashing with chains Problem Introduction In this problem you will implement a hash table using the chaining scheme. Chaining is one of the most popular ways of implementing hash tables in practice. The hash table you will implement can be used to implement a phone book on your phone or to store the password table of your computer or web service (but don’t forget to store hashes of passwords instead of the passwords themselves, or you will get hacked!).
Problem Description Task. In this task your goal is to implement a hash table with lists chaining. You are already given the number of buckets m and the hash function. It is a polynomial hash function h(S) =⎛ ⎝ |S|−1 ∑︁ i=0 S[i]xi mod p⎞ ⎠mod m, where S[i] is the ASCII code of the i-th symbol of S, p = 1 000 000 007 and x = 263. Your program should support the following kinds of queries: ∙ add string — insert string into the table. If there is already such string in the hash table, then just ignore the query. ∙ del string — remove string from the table. If there is no such string in the hash table, then just ignore the query. ∙ find string — output “yes” or “no” (without quotes) depending on whether the table contains string or not. ∙ check i — output the content of the i-th list in the table. Use spaces to separate the elements of the list. If i-th list is empty, output a blank line. When inserting a new string into a hash chain, you must insert it in the beginning of the chain. Input Format. There is a single integer m in the first line — the number of buckets you should have. The next line contains the number of queries N. It’s followed by N lines, each of them contains one query in the format described above. Constraints. 1 ≤ N ≤ 105; N 5 ≤ m ≤ N. All the strings consist of latin letters. Each of them is non-emptyand has length at most 15. Output Format. Print the result of each of the find and check queries, one result per line, in the same order as these queries are given in the input. Time Limits. C: 1 sec, C++: 1 sec, Java: 5 sec, Python: 7 sec. C#: 1.5 sec, Haskell: 2 sec, JavaScript: 7 sec, Ruby: 7 sec, Scala: 7 sec. Memory Limit. 512Mb.
5
Sample 1. Input: 5 12 add world add HellO check 4 find World find world del world check 4 del HellO add luck add GooD check 2 del good Output: HellO world no yes HellO GooD luck Explanation: The ASCII code of ’w’ is 119, for ’o’ it is 111, for ’r’ it is 114, for ’l’ it is 108, and for ’d’ it is 100. Thus, h(“world”) = (119+111×263+114×2632 +108×2633 +100×2634 mod 1 000 000 007) mod 5 = 4. It turns out that the hash value of HellO is also 4. Recall that we always insert in the beginning of the chain, so after adding “world” and then “HellO” in the same chain index 4, first goes “HellO” and then goes “world”. Of course, “World” is not found, and “world” is found, because the strings are case-sensitive, and the codes of ’W’ and ’w’ are different. After deleting “world”, only “HellO” is found in the chain 4. Similarly to “world” and “HellO”, after adding “luck” and “GooD” to the same chain 2, first goes “GooD” and then “luck”. Sample 2. Input: 4 8 add test add test find test del test find test find Test add Test find Test Output: yes no no yes Explanation: Adding “test” twice is the same as adding “test” once, so first find returns “yes”. After del, “test” is
6
no longer in the hash table. First time find doesn’t find “Test” because it was not added before, and strings are case-sensitive in this problem. Second time “Test” can be found, because it has just been added. Sample 3. Input: 3 12 check 0 find help add help add del add add find add find del del del find del check 0 check 1 check 2 Output: no yes yes no
add help
Explanation: Note that you need to output a blank line when you handle an empty chain. Note that the strings stored in the hash table can coincide with the commands used to work with the hash table.
Starter Files There are starter solutions only for C++, Java and Python3, and if you use other languages, you need to implement solution from scratch. Starter solutions read the input, do a full scan of the whole table to simulate each find operation and write the output. This naive simulation algorithm is too slow, so you need to implement the real hash table.
What to Do Follow the explanations about the chaining scheme from the lectures. Remember to always insert new strings in the beginning of the chain. Remember to output a blank line when check operation is called on an empty chain. Some hints based on the problems encountered by learners: ∙ Beware of integer overflow. Use long long type in C++ and long type in Java where appropriate. Takeeverything (mod p) assoonaspossiblewhilecomputingsomething (mod p),sothatthenumbers are always between 0 and p−1.
7
∙ Beware of taking negative numbers (mod p). In many programming languages, (−2)%5 ̸= 3%5. Thus you can compute the same hash values for two strings, but when you compare them, they appear to be different. To avoid this issue, you can use such construct in the code: x ← ((a%p) + p)%p instead of just x ← a%p. Need Help? Ask a question or see the questions asked by other learners at this forum thread.
8
3 Problem: Find pattern in text Problem Introduction In this problem, your goal is to implement the Rabin–Karp’s algorithm.
Problem Description Task. In this problem your goal is to implement the Rabin–Karp’s algorithm for searching the given pattern in the given text. Input Format. There are two strings in the input: the pattern P and the text T. Constraints. 1 ≤|P|≤|T|≤ 5·105. The total length of all occurrences of P in T doesn’t exceed 108. The pattern and the text contain only latin letters. Output Format. Print all the positions of the occurrences of P in T in the ascending order. Use 0-based indexing of positions in the the text T. Time Limits. C: 1 sec, C++: 1 sec, Java: 5 sec, Python: 5 sec. C#: 1.5 sec, Haskell: 2 sec, JavaScript: 3 sec, Ruby: 3 sec, Scala: 3 sec. Memory Limit. 512Mb. Sample 1. Input: aba abacaba Output: 0 4
Explanation: The pattern aba can be found in positions 0 (abacaba) and 4 (abacaba) of the text abacaba. Sample 2. Input: Test testTesttesT Output: 4
Explanation: Pattern and text are case-sensitive in this problem. Pattern Test can only be found in position 4 in the text testTesttesT. Sample 3. Input: aaaaa baaaaaaa Output: 1 2 3 Explanation: Note that the occurrences of the pattern in the text can be overlapping, and that’s ok, you still need to output all of them.
9
Starter Files The starter solutions in C++, Java and Python3 read the input, apply the naive O(|T||P|) algorithm to this problem and write the output. You need to implement the Rabin–Karp’s algorithm instead of the naive algorithm and thus significantly speed up the solution. If you use other languages, you need to implement a solution from scratch.
What to Do Implement the fast version of the Rabin–Karp’s algorithm from the lectures. Some hints based on the problems encountered by learners: ∙ Beware of integer overflow. Use long long type in C++ and long type in Java where appropriate. Takeeverything (mod p) assoonaspossiblewhilecomputingsomething (mod p),sothatthenumbers are always between 0 and p−1. ∙ Beware of taking negative numbers (mod p). In many programming languages, (−2)%5 ̸= 3%5. Thus you can compute the same hash values for two strings, but when you compare them, they appear to be different. To avoid this issue, you can use such construct in the code: x ← ((a%p) + p)%p instead of just x ← a%p. ∙ Use operator == in Python instead of implementing your own function AreEqual for strings, because built-in operator == will work much faster. ∙ In C++, method substr of string creates a new string, uses additional memory and time for that, so use it carefully and avoid creating lots of new strings. When you need to compare pattern with a substring of text, do it without calling substr. ∙ In Java, however, method substring does NOT create a new String. Avoid using new String where it is not needed, just use substring.
Need Help? Ask a question or see the questions asked by other learners at this forum thread.
10
4 General Instructions and Recommendations on Solving Algorithmic Problems Your main goal in an algorithmic problem is to implement a program that solves a given computational problem in just few seconds even on massive datasets. Your program should read a dataset from the standard input and write an answer to the standard output. Below we provide general instructions and recommendations on solving such problems. Before reading them, go through readings and screencasts in the first module that show a step by step process of solving two algorithmic problems: link.
4.1 Reading the Problem Statement You start by reading the problem statement that contains the description of a particular computational task as well as time and memory limits your solution should fit in, and one or two sample tests. In some problems your goal is just to implement carefully an algorithm covered in the lectures, while in some other problems you first need to come up with an algorithm yourself.
4.2 Designing an Algorithm If your goal is to design an algorithm yourself, one of the things it is important to realize is the expected running time of your algorithm. Usually, you can guess it from the problem statement (specifically, from the subsection called constraints) as follows. Modern computers perform roughly 108–109 operations per second. So, if the maximum size of a dataset in the problem description is n = 105, then most probably an algorithm with quadratic running time is not going to fit into time limit (since for n = 105, n2 = 1010) while a solution with running time O(nlogn) will fit. However, an O(n2) solution will fit if n is up to 103 = 1000, and if n is at most 100, even O(n3) solutions will fit. In some cases, the problem is so hard that we do not know a polynomial solution. But for n up to 18, a solution with O(2nn2) running time will probably fit into the time limit. To design an algorithm with the expected running time, you will of course need to use the ideas covered in the lectures. Also, make sure to carefully go through sample tests in the problem description.
4.3 Implementing Your Algorithm When you have an algorithm in mind, you start implementing it. Currently, you can use the following programming languages to implement a solution to a problem: C, C++, C#, Haskell, Java, JavaScript, Python2, Python3, Ruby, Scala. For all problems, we will be providing starter solutions for C++, Java, and Python3. If you are going to use one of these programming languages, use these starter files. For other programming languages, you need to implement a solution from scratch.
4.4 Compiling Your Program For solving programming assignments, you can use any of the following programming languages: C, C++, C#, Haskell, Java, JavaScript, Python2, Python3, Ruby, and Scala. However, we will only be providing starter solution files for C++, Java, and Python3. The programming language of your submission is detected automatically, based on the extension of your submission. We have reference solutions in C++, Java and Python3 which solve the problem correctly under the given restrictions, and in most cases spend at most 1/3 of the time limit and at most 1/2 of the memory limit. You can also use other languages, and we’ve estimated the time limit multipliers for them, however, we have no guarantee that a correct solution for a particular problem running under the given time and memory constraints exists in any of those other languages. Your solution will be compiled as follows. We recommend that when testing your solution locally, you use the same compiler flags for compiling. This will increase the chances that your program behaves in the
11
same way on your machine and on the testing machine (note that a buggy program may behave differently when compiled by different compilers, or even by the same compiler with different flags). ∙ C (gcc 5.2.1). File extensions: .c. Flags: gcc -pipe -O2 -std=c11