Description

5/5 - (1 vote)

Exercise 1 (60 points). Building a Hash Table
We want to compare the performance of hash tables implemented using chaining and open
addressing. In this assignment, we will consider hash tables implemented using the multiplication
and linear probing methods. Note that the multiplication method described here is slightly different from the one that was seen in class, but the principle remains the same. We will (respectively)
call the hash functions h and g and describe them below. Note that we are using the hash function
h to define g.
Collisions solved by chaining (multiplication method): h(k) = ((A · k) mod 2w) >> (w − r)
Open addressing (linear probing): g(k, i) = (h(k) + i) mod 2r
In the formula above, r and w are two integers such that w > r, and A is a random number
such that 2
w−1 < A < 2
w. In addition, let n be the number of keys inserted, and m the number of slots in the hash tables. Here, we set m = 2r and r = dw/2e. The load factor α is equal to n
m
.
We want to estimate the number of collisions when inserting keys with respect to keys and the
choice of values for A.
We provide you a set of two template files within COMP251HW1.zip that you will complete.
This file contains two classes, one for each hash function. Those contain several helper functions,
namely generateRandom that enables you to generate a random number within a specified range.
Please read the provided code describing the hashtable classes with attention.
Your first task is to complete the two java methods Open_Addressing.probe and Chaining.chain.
These methods must implement the hash functions for (respectively) the linear probing and multiplication methods. They take as input a key k, as well as an integer 0 ≤ i < m for the linear
probing method, and return a hash value in [0, m[.
Next, you will implement the method insertKey in both classes, which inserts a key k into
the hash table and returns the number of collisions encountered before insertion, or the number of
collisions encountered before giving up on inserting, if applicable. Note that for this exercise, we
define the number of collisions in open addressing as the number of keys encountered, or “jumped
over” before inserting or removing a key (note that this definition only makes sense if the key is
in the hash table). For chaining, we simply consider the number of other keys in the same bin at
the time of insertion as the number collisions. You can assume the key is not negative, and that
we will not attempt to insert a key that already exists in the hash table.
You will also implement a method removeKey, this one only in Open_Addressing. This method
should take as input a key k, and remove it from the hash table while visiting the minimum
number of slots possible. Like insertKey, it should output the number of collisions if the key
is found.
If the key is not in the hash table, the method should simply not change the hash table, and
output the number of slots visited before giving up.
You will notice from the code and comments that empty slots are given a value of −1. If
applicable, you are allowed to use a different notation of your choice for slots containing a deleted
element.
Make sure to test your assignment thoroughly by thinking about all the different situations
that can occur when dealing with hash tables. Build your own hash table and try inserting and
removing keys!
For this question, you will need to submit your Chaining.java and Open_Addressing.java
source files to the Assignment 1 => Q1 – Hash lesson in Ed-Lessons. You will not be tested on
execution time for this question, but you will be tested on the efficiency of your program in terms
of number of steps. You must implement your own hash table. Using the built-in hash
table from Java will result in a 0 on this question.
Exercise 2 (50 points). Building a Disjoint Set
We want to implement a disjoint set data structure with union and find operations. The
template for this program is available on the course website and named DisjointSets.java.
In this question, we model a partition of n elements with distinct integers ranging from 0 to n − 1
(i.e. each element is represented by an integer in [0, · · · , n − 1], and each integer in [0, · · · , n − 1]
represent one element). We choose to represent the disjoint sets with trees, and to implement the
forest of trees with an array named par. More precisely, the value stored in par[i] is parent of
the element i, and par[i]==i when i is the root of the tree and thus the representative of the
disjoint set.
You will implement union by rank and the path compression technique seen in class. The rank is
an integer associated with each node. Initially (i.e. when the set contains one single object) its
value is 0. Union operations link the root of the tree with smaller rank to the root of the tree
with larger rank. In the case where the rank of both trees is the same, the rank of the new root
increases by 1. You can implement the rank with a specific array (called rank) that has been
added to the template, or use the array par (this is tricky). Note that path compression does not
change the rank of a node.
Download the file DisjointSets.java, and complete the methods find(int i) as well as
union(int i, int j). The constructor takes one argument n (a strictly positive integer) that
indicates the number of elements in the partition, and initializes it by assigning a separate set to
each element.
The method find(int i) will return the representative of the disjoint set that contains i (do not
forget to implement path compression here!). The method union(int i, int j) will merge the
set with smaller rank (for instance i) in the disjoint set with larger rank (in that case j). In that
case, the root of the tree containing i will become a child of the root of the tree containing j),
and return the representative (as an integer) of the new merged set. Do not forget to update the
ranks. In the case where the ranks are identical, you will merge i into j.
Once completed, compile and run the file DisjointSets.java. It should produce the output
available in the file unionfind.txt available on MyCourses.
Exercise 3 (90 points). Improving our discussion board
The teaching staff in Comp251 is really happy of how our discussion board (Ed) is working;
however, we believe there is one function missing. This function will allow us to identify important
topics (discussed in Ed) by filtering key words. In particular, given a list of messages posted in
Ed, we want a function that reports the words used by every single user on the discussion board.
This list must be sorted from most to least used word (i.e., the word with the highest frequency
must be the first one). In case of a frequency tie, the word needs to be sorted in alphabetical
order.
Let’s see now some features of the discussion board posts. The list of post will be provided to
you as an array of strings (String[]), where every slot in the array will contain a message. All
messages will have the following characteristics.
• Each message is represented in Java as a String
• Each message begins with a user’s name of no more than 20 characters.
• After the name, each message continues with the content of that user’s post all in lower case.
• The total number of characters across all messages, including spaces, will not exceed 2 ∗ 106
Let’s see now two examples to make sure that everything is clear. Given the following list of
posts:
David no no no no nobody never
Jennifer why ever not
Parham no not never nobody
Shishir no never know nobody
Alvin why no nobody
Alvin nobody never know why nobody
David never no nobody
Jennifer never never nobody no
Your algorithm must return the array [no, nobody, never] (exactly in that order). Those
three words were used by every single user of our discussion board and they are reported in order
of frequency (i.e., “no” is the most frequent used word). In case of a tie, the order was decided
lexicographically.
Now, if the following list of post is given to you:
David comp
Maria music
Your algorithm must return an empty array [] given that any of the words in the post was
used by every single user.
For this question, you must implement your solution in the function Discussion_Board(String[]
posts) which is inside the class/file A1_Q3.java. Please notice that for this question the correctness and efficiency of your algorithm will be tested, then it is of your interest to code your solution
using the right algorithms and data-structures. Please notice that for this question, it is forbidden to create new classes.
Exercise 4 (0 points). Least common multiple This optional problem is intended to prepare
you for the midterm. We will provide solutions, but you will not receive marks for completing
it. here, we aim to study an algorithm that computes, for an integer n ∈ N, the least common
multiple (LCM) of integers ≤ n.
For a given integer n ∈ N, let Pn = p
x1
1 p
x2
2
· · · p
xk
k
, where p1, p2, · · · , pk is a strictly increasing
sequence of prime numbers between 2 and n and for each i ∈ {1, · · · k}, xi
is the integer such that
p
xi
i ≤ n < pxi+1
i
. For example, P9 = 23
· 3
2
· 5 · 7.
More precisely, we’re going to compute all Pj
, j ∈ {1, · · · , n} and store pairs of integers (p
α
, p)
in a heap, a binary tree where the element stored in the parent node is strictly smaller than
those stored in children nodes. For two given pairs of integers (a, b) and (a
0
, b0
), (a, b) < (a
0
, b0
)
if and only if a < a0
. Let h denotes the tree height, we admit that h = Θ(log n). All levels of
the binary tree are filled with data except for the level h, where elements are stored from the left
to the right. After computing Pj
, all pairs (p
α
, p) are stored in the heap such that p is a prime
number smaller or equal to j and α is the smallest integer such that j < p
α
. For instance, after
computing P9, we store (16, 2), (27, 3), (25, 5), and (49, 7) in the heap.
The algorithm is iterative. We store in the variable LCM the least common multiple computed
so far. At first, LCM= 2 is the LCM of integers smaller than 2 and the heap is constructed with
only one node with value (4, 2). After finish the (j − 1)-th step, we compute the j-th step as
follows:
1. If j is a prime number, multiply LCM by j and insert a new node (j
2
, j) in the heap.
2. Otherwise, if the root (p
α
, p) satisfies j = p
α
, then we multiply LCM by p, change the root’s
value by (p
α+1, p), and reconstruct the heap.
We’re going to prove, step by step, that the time complexity of this algorithm is O(n
√
n).
3.1 – 0 points
In operation 1, a new node is inserted. What is the complexity of this operation?
3.2 – 0 points
In operation 2, the heap is reconstructed. What is the time complexity of this operation?
3.3 – 0 points
The number of prime numbers smaller than n concerned in the operation 2 is less than √
n. Prove
that the number of times N we need to execute operation 2 to compute Pn is asymptotically
negligible compared to n. Tip: you can prove this by proving N is o(n), where o (little o) denotes
a strict upper bound.
3.4 – 0 points
Assume the complexity of assessing whether an integer is a prime number is √
n and suppose
multiplication has a time complexity of 1. Prove that the algorithm’s complexity is O(n
√
n).
3.5 – 0 points
Prove that, for a given heap of height h with n nodes, we have h = Θ(log n). No partial credit
will be awarded.
What To Submit?
Attached to this assignment are java template files. You have to submit only this java files. Please
DO NOT zip (or rar) your files, and do not submit any other files.
Where To Submit?
You need to submit your assignment in ed – Lessons. Please review the tutorial 2 if you still have
questions about how to do that (or attend office hours). Please note that you do not need to
submit anything to myCourses.
When To Submit?
Please do not wait until the last minute to submit your assignment. You never know what could
go wrong during the last moment. Please also remember that you are allowed to have multiple
submission. Then, submit your partial work early and you will be able to upload updated versions
later (as far as they are submitted before the deadline).
How will this assignment be graded?
Each student will receive an overall score for this assignment. This score is the combination of
the passed open and private test cases for the three questions of this assignment. The open cases
correspond to the examples given in this document plus other examples. These cases will be run
with-in your submissions and you will receive automated test results (i.e., the autograder output)
for them. You MUST guarantee that your code passes these cases. In general, the private test
cases are inputs that you have not seen and they will test the correctness of your algorithm on
those inputs once the deadline of the assignment is over; however, for this assignment you will
have information about the status (i.e., if it passed or not) of your test. Please notice that not all
the test cases have the same weight.
Student Code of Conduct Assignment Checklist
The instructor provides this checklist with each assignment. The instructor checks the boxes to
items that will be permitted to occur in this assignment. If an item is not checked or not present
in the list, then that item is not allowed. The instructor may edit this list for their case. A student
cannot assume they can do something if it is not listed in this checklist, it is the responsibility of
the student to ask the professor (not the TA).
Instructor’s checklist of permitted student activities for an assignment:
Understanding the assignment:
Read assignment with your classmates
Discuss the meaning of the assignment with your classmates
Consult the notes, slides, textbook, and the links to websites provided by the
professor(s) and TA(s) with your classmates (do not visit other websites)
Use flowcharts when discussing the assignment with classmates.
Ask the professor(s) and TA(s) for clarification on assignment meaning and
coding ideas.
Discuss solution use code
Discuss solution use pseudo-code
Discuss solution use diagrams
Can discuss the meaning of the assignment with tutors and other people
outside of the course.
Look for partial solutions in public repositories
Doing the assignment:
● Writing
Write the solution code on your own
Write your name at the top of every source file with the date
Provide references to copied code as comments in the source code
(e.g. teacher’s notes). Please notice that you are not allowed to copy
code from the internet.
Copied code is not permitted at all, even with references
Permitted to store partial solutions in a public repository
● Debugging
Debug the code on your own
Debugging code with the professor
Debugging code with the TA
Debugging code with the help desk
Debugging code with the Internet. Please notice that this is allowed to
debug syntax errors, no logic errors.
You can debug code with a classmate
You can debug code with a tutor or other people outside of the course
● Validation
Share test cases with your classmates
● Internet
Visit stack-overflow (or similar). Please notice that this is allowed only
to debug syntax errors, no logic errors.
Visit Chegg (or similar)
● Collaboration
Show your code with classmates
Sharing partial solutions with other people in the class
Can post code screenshots on the course discussion board
Can show code to help desk
Submitting and cleaning up after the assignment:
Backup your code to a public repository/service like github without the
express written permission from the professor (this is not plagiarism, but it
may not be permitted)
Let people peek at your files
Share your files with anyone
ZIP your files and upload to the submission box
Treat your work as private
Make public the solutions to an assignment
Discuss solutions in a public forum after the assignment is completed

Solved Comp 251: Assignment 1 Exercise 1 (60 points). Building a Hash Table

Download Details:

Description

Solved Comp 251: Assignment 1 Exercise 1 (60 points). Building a Hash Table

Download Details:

Description

Related products

Assignment 3 COMP 250 solved

CS3353 Programming assignment 3 solution

CS3100/5100: Data Structures and Algorithms Programming Assignment #4 solution