## Description

Background:

Searching for a certain value in a large database or list can take a long time especially if that

value is stored in a space that is the last place that you look. See example below:

Index 0 1 2 3 4 5 6 … 20483958 20483959 20483960 20483961

Value # # # # # # # … 49842 428 32532 35246

If you tried to find 32532 in the array using traditional brute force, it would take a long

time because 32532 is at the end of the array. You would have to check around 20483960

memory addresses before finding it.

A hash function is a function that generates a key for a value. This key allows us to

1. Strategically place your data

2. Easily find your data when you need it

There are many types of hashing, but we will focus on modular hashing.

ℎ(�) = � ��� �

• h(k): mapping

• k: key

• m: number of addresses

Another example:

293587

85023840

54234914

39482905

9850293

209524

98520398

623407

Use the numbers on the left as our example. There are 8 numbers in

our array and we want to arrange them in a way that allows us to find

them as quickly as possible. Because there are 8 spots in the array, we

are going to use mod 8 to define these locations for us. For example:

293587: 293587 mod 8 = 3. So, we’ll put this number in index 3

85023840:85023840 mod 8 =0. We’ll put this number in index 0

Continue this pattern….

Finally we’ll have an array that looks like below. This is called a hash table:

Let’s say I want to find or have access to a certain number such as

98520398.

If I wanted to use a brute force algorithm to find it in the array

above, it would take me 7 tries if I started from the beginning of the

array and 2 tries if I started at the end of the array.

However, if I want to find it in my hash table, I would plug it into

the hash function that I used to create the hash table x mod 8 and

I would get a remainder of 7. In 1 try, I can find the value I was

looking for.

0 85023840

1 39482905

2 54234914

3 293587

4 209524

5 9850293

6 98520398

7 623407

In real life, we can have collisions. To deal with collisions, there are multiple strategies that

programmers use, but we will cover linear probing ONLY. We won’t talk about chaining,

quadratic probing, or clustering in this class. You can learn that in cecs328

Collision: When a hash function maps two different keys to the same address

Linear probing: A technique used to overcome collision where the key is re-hashed by

placing the value in the next available slot in the table.

Example:

0

1

2 10 / 66

3

4

5

6

7

If you try to insert 10, you would put it in address 2 because 10 mod 8 = 2

If you try to insert 66, that would also give you address 2 because 66 mod 8 = 2

This is what we call a collision.

How to circumvent the issue (linear probing)

• Re-hash the value by placing it in the next available slot in the table

0

1

2 10

3 66

4

5

6

7

If you try to insert 83, you would put it in address 4 because 83 mod 8 = 3, but slots 3

is already taken. The next available position would be 4

If you try to insert 42, you would put it in address 5 because 42 mod 8 = 2, but slots 2,

3, 4 are already taken. The next available position would be 5

0

1

2 10

3 66

4 83

5 42

6

7

In the case that you reach the end of the hash table during linear probing, you want to loop

back around and continue checking from index 0.

Instructions:

1. Take a close look at the hashing.py file. There are some functions that have

already been implemented. In addition, there are two empty functions:

linear_probe(value, start_index) and hash(value). Read through both of

their descriptions carefully. Remember, you will lose points if you do not follow the

instructions. We are using a grading script.

Linear_probe(value, start_index)

input • value- value to be inserted

• start_index- where linear probing starts

output • returns the index of the hash_table that the

value should be inserted after linear probing

restrictions • Although you can implement this function with

just one input, DO NOT alter the function

heading

assumptions • value will always be an integer

• your table will always be big enough

to_hash(value)

input • value- value to be inserted

output • Do not return anything. Just insert value into

the proper position in self.table. Utilize

linear_probe and insert in this function

restrictions •

assumptions • value will always be an integer

• your table will always be big enough

2. Please note that a few functions are already completed: insert(value, index)

get_table() and __str__(). You are required to use the insert() function

in your hash() function. However, the other two functions get_table() and

__str__() are provided for debugging purposes (if you’re having issues with your

code) and do not have to be used in any of your functions. Do NOT alter the already

implemented functions in ANY significant way.

3. Your job is to implement both linear_probe() and hash() so that it passes any

test case. There are two sample test cases provided for you, but these are not the

only cases that we will test. We will be testing other test cases in the same way the

test cases are presented.

Note: The grading script will test linear_probe() and hash() separately!

Although you can implement hash() without linear_probe(), not

implementing linear_probe() will result in lost points.

Another note: For linear_probe(), you do not have to use both inputs. If you

think that you can implement this function with just one of the inputs, then you can

do that. Two inputs were provided for testing purposes.

4. After completing these functions, comment out the test cases (or delete them) or

else the grading script will pick it up and mark your program as incorrect. At the

minimum, the test cases must be commented/deleted. It is up to you if you want to

remove the checker() function.

5. Convert your hashing.py file to a .txt file. Submit your hashing.py file and

your .txt file on BeachBoard. Do NOT submit it in compressed folder.

6. Do not email us your code asking us to verify it. We will answer general questions,

but we will not debug your code over email.

Grading rubric

Points Requirement

15 Implemented linear_probe() correctly

15 Implemented hash() correctly

5 Passes 2 original test cases (also commented out or deleted the test

cases)

*** Note: If your program has an error, you will automatically get a 0 on the lab! Double

check to make sure that you do not have any errors!!!

** If we find that you have significantly altered insert() or get_table(), then you will

get an automatic 0! Regardless of your other functions being correct or not.