Description
CSCI241 Assignment 1
1 Overview
In this assignment, you will implement four sorting algorithms, an interactive command-line program that demonstrates the sorting algorithms, and create a write-up analyzing the number of comparisons performed by each sort as a function of input array size.
Your primary tasks are as follows: • Implement the methods for insertion, merge, quick, and radix sorts in Sorts.java. You will also need to implement the merge and partition helper methods for merge sort and quick sort, respectively. • Implement the user-facing behavior described below in SortsDriver, using the sorting methods from Sorts.java to perform the sorting.
• Create a writeup analyzing how number of comparisons grows as a function of input array size for insertion, merge, and quicksort. • The base assignment is worth 45 points; to earn the remaining 5 points (or more, for extra credit), you can complete some of the enhancements described in Section 6 (or come up with your own) and explain what you did in your writeup.
2 Getting Started
The Github Classroom invitation link for this assignment is in Assignment 1 on Canvas. Begin by accepting the invitation and cloning a local working copy of your repository as you did in Lab 1. Make sure to clone it somewhere outside your lab1 working copy (e.g., ~/csci241/a1) 1 to avoid nesting local repositories. Skeleton code is provided in your repository to get you started. See Section 7 below for a suggested game plan for getting the sorts, user interface, and writeup done. The following sections provide details and hints on each subtask.
3 Sorting Algorithms
Sorts.java contains method headers for six public methods: • insertionSort • merge • mergeSort • partition • quickSort • radixSort The method headers and specifications (i.e., the name, return type, parameters, and the Javadoc comment specifying the method’s behavior should not be changed. If you change method names, call signatures, or return values, your code will not compile with the testing system and you’ll receive no credit for the correctness portion of your grade. Public methods must implement their specifications precisely and completely.
For example, even if your quickSort method always calls partition using the first element as the pivot, partition is still required to work with any other legal pivot index specified, because such behavior is prescribed in the specification. In Lab 2, you will write unit testing code that will help you check the correctness of the sorting methods. As you develop the sorts, you should run gradle test frequently and make sure that each algorithm passes all its tests before moving on to the next.
3.1 Implementation Notes • You may write and use as many private helper methods as you need. • The mergeSort and quickSort implementations must be recursive and all sorts must be asymptotically efficient. • The Sorts class has a private member comparisonCount. In each of the sorts that you implement, use this counter to tally the count of comparisons that are done between entries of the array as it is sorted. For example, for insertionSort, tally the number of times that array[j] < inputArr[j-1] is performed.
For quickSort, tally the number of times that A[j] is compared to the pivot, etc. Be sure to count all comparisons made, not just those that evaluate to true. You do not need to count comparisons among loop variables (e.g., you should ignore the i < n comparison in a for loop header). • The bottom of Sorts.java has two private helper methods written for you that you may find useful.
2 • Radix sort requires the use of a stable sorting algorithm to sort the array on each digit. You can either use counting sort (see CLRS 8.2) or maintain a list of queues, one to store the contents of each bucket. Counting sort is algorithmically trickier. On the other hand, creating an array of queues of integers in Java can be a bit painful because of the way generics and arrays interact.
The following snippet creates and populates an ArrayList of 10 buckets, each of which is a LinkedList of integers: ArrayList<LinkedList> buckets = new ArrayList<LinkedList>(10); for (int i = 0; i < 10; i++) { buckets.add(new LinkedList()); } Because buckets is an ArrayList, use buckets.get(i) to get the LinkedList storing the i’th digit.
Remember that a LinkedList implements the Queue interface; see the Java documentation for details on which methods make it behave like a Queue. • Radix sort as described in class does not naturally play well with negative integers. Get it working on nonnegative numbers first, then figure out how to handle negatives. You may assume that the values to be sorted are not extremely large or small and do not approach the largest or smallest value that can be represented using an int.
4 Interactive Program Behavior
The main method of SortsDriver should implement a program that behaves as follows. To run the program, you can simply use gradle run. When the program starts, it should: 1. Prompt the user for the size of the array, n, and create an array of that size made up of integer values chosen randomly from -n..n+1. 2. Prompt the user to specify which sort to use (merge sort, quick sort, insertion sort, radix sort, or all).
The user should be asked to enter a single letter: m, q, i and r, or a. 3. If all (a) sorts is specified, the input to each sort must be identical 4. If n ≤ 20, the pre-sorted and sorted array’s contents are printed for each sort invoked 5. If n > 20, the pre-sorted and sorted array’s contents are NOT printed for each sort invoked 6. The count of comparisons performed is/are output. Several sample invocations of the program are shown in Figure 1. Note the order of the prompts must be as specified, though the text does not need to be precisely the same as the example.
4.1 Implementation Notes • Error catching is NOT required. You do not have to catch if a user specifies a negative count of entries, or inputs a letter, or provides a sort option that is not m, i, q, r nor all. Consider using a switch statement. • The java.util.Random and java.util.Scanner classes from the Java Standard Library may come in handy. 3 % java SortsDriver Enter sort (i[nsertion], q[uick], m[erge], r[adix], a[ll]): q Enter n (size of array to sort): 15 Unsorted: [ -12 5 -9 15 -7 13 9 13 10 -7 14 -14 -14 2 -10 ] Sorted: [ -14 -14 -12 -10 -9 -7 -7 2 5 9 10 13 13 14 15 ]
Comparisons: 42 % java SortsDriver Enter sort (i[nsertion], q[uick], m[erge], r[adix], a[ll]): m Enter n (size of array to sort): 1000 Comparisons: 8709 % java SortsDriver Enter sort (i[nsertion], q[uick], m[erge], r[adix], a[ll]): r Enter n (size of array to sort): 12 Unsorted: [ 4 1 -8 5 -2 12 -7 -4 -11 -7 -5 -8 ] Sorted: [ -11 -8 -8 -7 -7 -5 -4 -2 1 4 5 12 ] Comparisons: 0 % java SortsDriver Enter sort (i[nsertion], q[uick], m[erge], r[adix], a[ll]): a Enter n (size of array to sort): 10 Unsorted: [ -9 9 -6 -8 -4 0 -7 -3 6 8 ] insertion: 13 Sorted: [ -9 -8 -7 -6 -4 -3 0 6 8 9 ] quick: 21 Sorted: [ -9 -8 -7 -6 -4 -3 0 6 8 9 ] merge: 23 Sorted: [ -9 -8 -7 -6 -4 -3 0 6 8 9 ] % java SortsDriver Enter sort (i[nsertion], q[uick], m[erge], r[adix], a[ll]): a Enter n (size of array to sort): 1000 insertion: 248189 quick: 10453 merge: 8695
Figure 1: Sample Invocations of SortDriver.java 4 • Do not use System.console to read user input. • Do not create more than one Scanner object reading from System.in. Re-use the same Scanner object for all user input. • To ensure that the all option works as intended, you’ll need to make a deep copy of the randomly generated array to give to each sort method. • For the all option, avoid counting comparisons for multiple sorts: either reset the comparison counter (there’s a handy method for this provided in Sorts.java) or create a fresh Sorts object for each sort.
• Precise comparison counts may differ based on subtle implementation choices, even across multiple correct solutions. However, the relative counts between insertion sort O(n 2 ) and quick sort O(n log2 n), for example, should differ greatly and clearly demonstrate their relative run-times. • As described in the style guide on the syllabus and the rubric at the end of this document, overly long methods (e.g., with hundreds of lines of code) are considered bad programming style.
Be sure that your program is broken down into sensible, modular helper methods, rather than a monolithic main method. 5 Writeup Your writeup should include the following: • Table. For the insertion, quick, and merge sorts, run each of them on successively larger arrays, and tally the count of comparisons made as a function of n, the array size. Vary n from 10 to 500 and list the data in a table. See Figure 2 as an example. Notice that radix sort is not listed here; you be able to explain why this is the case. n Insertion Quick Merge 10 33 21 24 50 607 234 222 100 2459 550 528 200 11353 1401 1282 500 58535 4682 3852 Figure 2: Sample tally table • Plot. Using graphing software of your choice, create a plot of the values you have tallied. Be sure to label both the x and y axes, provide a title, and make sure that you provide a legend to make it easy to identify merge, insertion, and quick sort performance. A sample plot is shown in Figure 3.
• Discussion. Describe any design decisions you made along the way. If you complete any of the enhancements (see below), describe what you did and include any associated plots or analysis. • Acknowledgements If applicable, list any people you collaborated with and any external resources (i.e., anything other than the recommended texts, lecture slides, and help in office hours) you used in the course of completing this assignment 5 Figure 3: Sample plot of comparisons done by insertion, merge, and quicksort 6 Enhancements The base assignment is worth 45/50 points.
The final 5 points may be earned by completing one or more of the following enhancements. You may also come up with your own ideas – you may want to run them by the instructor to make sure they’re worthwhile and will result in points awarded if successfully completed. It is highly recommended that you complete the base assignment before attempting any enhancements. Enhancements and git The base project will be graded based on the master branch of your repository.
Before you change your code in the process of completing enhancements, create a new branch in your repository (e.g., git checkout -b enhancements). Keep all changes related to enhancements on this branch—this way you can add functionality, without affecting your score on the base project. For example, the first enhancement asks you to add a third user prompt to choose between a sorted array and a random array. As this departs from the user-facing behavior described in the base project, such a change should only be made in your enhancements branch.
Make sure you’ve pushed both master and enhancements branches to GitHub before the submission deadline. 1. (1 point) Real-world sorting inputs rarely come in uniformly random order. Add a prompt that allows the user to choose among the following arrays that try to model some likely real-world use cases: • A random array (as in the base project) • A fully sorted array • An array that is sorted, except the last 5% of its values are random • An array in which 90% of elements are, amongst themselves, in sorted order, while the other 10% are random. Tip: the java.util.Random class’s nextDouble() method generates a random floating-point value between 0.0 and 1.0. Generate another plot for each of these types of arrays.
Describe the differences—which ones can you explain? Are any surprising/inexplicable? 2. (2 points) Make a table and plot of performance in terms of elapsed time instead of number of comparisons. You may find the built-in System.nanoTime() function useful. 3. (2 points) Implement the median-of-three (first, middle, last) pivot in quicksort. Plot the number of comparisons done by both variants of quickSort and insertionSort. (1 additional 6 point) Repeat this experiment, but run the sorts on sorted arrays and nearly-sorted arrays from Enhancement 1. 4. (Up to 4 points) Most real-world sorts built into modern programming languages are hybrid algorithms that combine more than one algorithm depending on the array size, ordering, etc.
Implement a hybrid sorting algorithm and analyze its performance relative to the other sorts. You may find it interesting to note differences in performance measured by number of comparisons vs elapsed time. Try to outperform both quicksort and insertionsort on random, sorted, and mostly-sorted arrays. You may search the internet for inspiration and strategies, but please cite your sources, write your own code, and explain your algorithm in the writeup. A good-faith attempt that does not beat insertion and quicksort may still receive some credit.
7 Game Plan Start small, test incrementally, and git commit often. Please keep track of the number of hours you spend on this assignment; when you are finished, write the approximate number of hours you spent as a lone integer in hours.txt and make sure it is included when you push your final changes to github to submit. Hours spent will not affect your grade. A suggested approach is outlined below: 1. Implement the insertionSort method in Sorts.java. Use the skeleton code provided for you in SortsDriver.java test out the insertionSort method to sort on a small fixed array. When you have this working, commit your changes to git.
2. In SortsDriver, implement and test random array generation. Prompt the user for array size and which sort, and add functionality to print the array before and after sorting. 3. Implement each of the remaining sorts, testing using both the driver and the unit tests as you go. Commit your changes git frequently, and push to github each time you get a new piece working. Recommended order: merge, quick, radix. 4. Implement the all option. Avoid copy/pasting code you’ve already written—instead, factor useful pieces into private methods that you can call more than once. Commit your changes. 5. Create the table and plot in your writeup.
Don’t forget axis labels, legend, and title. 6. If you plan to do any enhancements, create an enhancements branch in your repository to track the changes beyond the base project (git checkout -b enhancements). Commit all new changes to this branch and don’t merge it into the master branch. Add a description and analysis of each enhancement to your writeup. 7. Fill in hours.txt, run the tests one last time, and read through the rubric to make sure you’ve finished everything.
Read through the code clarity section to be sure you won’t lose style points; see the syllabus for more details on coding style. 8 How and What to Submit Submitting the assignment is as simple as pushing your final changes to GitHub (git push origin master or just git push) before the deadline. Don’t forget that committing changes 7 does not automatically push them to GitHub! Submit the writeup with runtime tallies, plot, and description of any enhancements in pdf format on Canvas.
8 Rubric For the coding portion of each assignment, you earn points for the correctness and efficiency of your program. Points can be deducted for errors in commenting, style, clarity, etc. Hours Code is pushed to github and hours spent appear as a lone integer in hours.txt 1 point Writeup Tally table of comparison counts for quicksort, merge sort, and insertion sort. 3 Plot of tallied numbers, including axis labels, title, and legend. 3 Code : Correctness Sorting algorithms and helper methods are implemented correctly as determined by unit tests (1 point per test).
20 Program prompts user for number of integers desired, relies on random number generator to populate the array, and prompts for type of sort to run (m, i, q, and r, a). 3 Each invocation of a sort correctly tallies the count of comparisons made. 3 When the a option is specified, all four sorts are invoked, whose input is the same array. 2 If n ≤ 20, pre-sorted and sorted array(s) are printed; otherwise, the arrays are not printed. 2 Code : Efficiency Mergesort runs in O(n log n) 2 QuickSort runs in O(n log n) in the expected case 2 Insertion sort runs in-place, and runs O(n 2 ). 2 Radix makes a constant number of O(n) passes over the input.
2 Enhancements See Enhancements section above. up to 5 Clarity deductions (up to 2 points each) Include author, date and purpose in a comment comment at the top of each file you write any code in Methods you introduce should be accompanied by a precise specification Non-obvious code sections should be explained in comments Indentation should be consistent Method should be written as concisely and clearly as possible Methods should not be too long – use private helper methods Code should not be cryptic and terse Variable and function names should be informative Total 50 points
CSCI241 Assignment 2
1 Overview
You are given a partial implementation of a Binary Search tree in AVL.java. Your task will be
to complete this implementation and turn the BST into an AVL tree by maintaining the AVL
balance property on insertion. You will use this AVL tree to efficiently count the number of
unique words in a text document.
2 Getting Started
The Github Classroom invitation link for this assignment is in Assignment 2 on Canvas. Begin
by accepting the invitation and cloning a local working copy of your repository as you did in
Assignment 1. Make sure to clone it somewhere outside the local working copies for other
assignments and labs (e.g., clone to ~/csci241/a2) to avoid nesting local repositories.
3 Program Behavior: User’s perspective
Some helpful skeleton code for the word-counting application is provided in Vocab.java.
Each command-line argument to the Vocab program is a text file. The program reads words
from a text file, removing whitespace and punctuation, and normalizing to all lower case – this
means “Band” and “band,” will both come out as “band” and you won’t count the same word
as two different ones.
For each text file, your program should then print two numbers on a single line:
1. The number of unique words used in the document.
2. The total number of words used in the document.
If the program receives no command-line arguments, it should read text from standard in
(System.in) until an end-of-file character is reached (in linux you can send the EOF character
to a process by pressing Ctrl+d; in Windows, I’m told the shortcut is Ctrl+z). If this doesn’t
work, try pressing the same shortcut twice in a row.
Some sample invocations of the program appear below. The user typing the EOF character
is represented as ^D. Note that the particulars of the gradle output may differ depending on the
particular state of your gradle build, but the inputs and outputs of a working solution should
match those shown in the examples below.
$ gradle run
> Task :compileJava UP-TO-DATE
> Task :processResources NO-SOURCE
> Task :classes UP-TO-DATE
Words, words, words.
What is the matter, my lord?
Between who?
I mean, the matter that you read, my lord.^D
> Task :run
14 20
BUILD SUCCESSFUL in 18s
2 actionable tasks: 1 executed, 1 up-to-date
$ gradle run –args=”a.txt b.txt”
> Task :compileJava UP-TO-DATE
> Task :processResources NO-SOURCE
> Task :classes UP-TO-DATE
> Task :run
14 20
1955 8060
BUILD SUCCESSFUL in 0s
2 actionable tasks: 1 executed, 1 up-to-date
$ gradle run –args=”b.txt”
> Task :compileJava UP-TO-DATE
> Task :processResources NO-SOURCE
> Task :classes UP-TO-DATE
> Task :run
1955 8060
BUILD SUCCESSFUL in 0s
2 actionable tasks: 1 executed, 1 up-to-date
4 Your Tasks
Skeleton code is provided in your repository. The AVL class in src/main/java/avl/AVL.java
currently implements the search functionality for a BST.
1. Implement standard BST (not AVL) insert functionality in the provided bstInsert
method stub. As with search, the AVL class has a public bstInsert(String w) method
that calls a private bstInsert(Node n, String w) method that recursively inserts on
nodes. Notice that AVL class has a size field that should be kept up to date as words are
inserted. Note: bstInsert does not need to keep heights up-to-date; this is only necessary in avlInsert, and you can assume that bstInsert calls are not mixed with avlInsert
calls.
2. Implement leftRotate and rightRotate helper methods to perform a rotation on a
given node. Use the lecture slides as a reference.
3. Implement rebalance to fix a violation of the AVL property caused by an insertion. In the
process, you’ll need to correctly maintain the height field of each node. Remember that
height needs to be updated any time the tree’s structure changes (insertions, rotations).
4. Implement avlInsert to maintain AVL balance in the tree after insertions using the
rebalance method.
5. Use your completed AVL tree class to efficiently (O(n log n)) count the number of unique
words found in each document processed by Vocab. Because insertion ignores duplicates,
the number of unique words will simply be the size of the tree.
6. For the final 5 points, complete some or all of the series of enhancements described below.
7. Record the total number of hours you spent on this assignment in the provided “hours.txt”
file. The file should contain one line with a single integer representing the estimated time
you took to complete the project.
4.1 Implementation notes
• Public method specifications and signatures should not be changed: if method
names, call signatures, or return values change, your code will not compile with the testing
system and you’ll receive no credit for the correctness portion of your grade.
• You may write and use as many private helper methods as you need. You are especially
encouraged to use helper methods for things like calculating balance factors, updating
heights, etc., in order to keep the code for intricate procedures like rebalance easy to
read.
• The skeleton code implements a try/catch block to skip nonexistent files in Vocab.java.
Error catching beyond this is not required – you may assume well-formed user input and
that method preconditions will not be violated.
• Be careful with parent pointers. A recursive tree traversal such as the reverse-in-order
traversal used in printTree never follows parent pointers; this means parent pointers can
be misplaced and printTree will still look normal.
• Keep in mind that the height method from Lab 4 is O(n), which means you can’t call
it on every node and expect to maintain the O(n log n) runtime in AVLInsert: instead
you need to update the height of each node along the insertion path from the bottom up,
updating each node’s height field using the heights of its children.
• You are provided with a test suite in src/test/java/avl/AVLTest.java. Use gradle
test often and pass tests for each task before moving onto the next.
5 Enhancements
The base assignment is worth 45/50 points. The final 5 points may be earned by completing
the following enhancements. You may also come up with your own ideas – you may want to
run them by the instructor to make sure they’re worthwhile and will result in points awarded
if successfully completed. It is highly recommended that you complete the base assignment
before attempting any enhancements.
Enhancements and git The base project will be graded based on the master branch of
your repository. Before you change your code in the process of completing enhancements, create
a new branch in your repository (e.g., git checkout -b enhancements). Keep all changes
related to enhancements on this branch—this way you can add functionality, without affecting
your score on the base project. Make sure you’ve pushed both master and enhancements
branches to GitHub before the submission deadline.
The final five points can be earned as follows:
1. (2 points) Implement remove using standard BST removal (without maintaining AVL
balance).
2. (1 point) Modify your remove method to maintain AVL balance through removals.
3. (1 point) Add a count field to the Node class and modify avlInsert and remove to
maintain count as the net number of additions/removals of a word. In other words,
count should store the number of times the word has been added minus the number of
times it has been removed. If count gets to zero, remove the node from the tree.
4. (1 point) If a single file is specified as a command line argument, print the number of
unique words over a fixed-width sliding window as the text is processed. For example, for
a window size of 30, maintain a set of the most recent 30 words; after each word is read,
print the number of unique words in the most recent 30, separated by newlines.
If you complete any of the above, explain what you did and any design decisions made in a
comment at the top of the corresponding java file.
6 Game Plan
Start small, test incrementally, and git commit often. Please keep track of the number of hours
you spend on this assignment, as you will be asked to report it in hours.txt. Hours spent will
not affect your grade.
The tasks are best completed in the order presented. Make sure you pass the tests for the
current task before moving on to the next. Rotations and rebalancing are the trickiest part.
Visit the mentors, come to office hours, or post on Piazza if you are stuck. A suggested timeline
for completing the assignment in a stress-free manner is given below:
1. By the end of Sunday 5/3: BST insertion completed and tested.
2. By the end of Tuesday 5/5: rotations implemented and tested.
3. By the end of Thursday, 5/7: rebalance implemented and tested.
4. By Saturday, 5/9 AVL insertion and Vocab behavior implemented and tested.
5. By Monday, 5/11: Any enhancements completed, hours recorded, and final changes
pushed to GitHub.
7 How and What to Submit
Submit the assignment by pushing your final changes to GitHub before the deadline. Be sure
to fill in hours.txt with a single integer representing the estimated number of hours you spent
on this assignment. Feel free to tell me how the assignment went on the line below the number
of hours. If you completed any enhancements, be sure to push your enhancements branch as
well.
Rubric
You can earn points for the correctness and efficiency of your program, and points can be
deducted for errors in commenting, style, clarity, and following assignment instructions.
Git Repository
Code is pushed to github and hours spent appear as a lone integer in hours.txt 1 point
Code : Correctness
Unit tests (n tests passed earns d1.33ne points) 28
Vocab prints the unique and total counts of words from standard input 3
Vocab prints the unique and total counts of words from each command-line argument
Code : Efficiency
avlInsert maintains O(log n) performance by keeping track of node heights and
updating them as necessary
Vocab processes a document with n words in O(n log n) time 5
Enhancements
remove correctly removes a node from the tree 2
remove maintains AVL balance 1
The tree tracks a count for each node, inserting or incrementing on insertion and
decrementing or removing on removal.
Sliding window vocabulary tracking is implemented. 1
Clarity deductions (up to 2 points each)
Include author, date and purpose in a comment comment at the top of each file
you write any code in
Methods you introduce should be accompanied by a precise specification
Non-obvious code sections should be explained in comments
Indentation should be consistent
Methods should be written as concisely and clearly as possible
Methods should not be too long – use private helper methods
Code should not be cryptic and terse
Variable and function names should be informative
Total 50 points
CSCI241 Assignment 3
1 Overview
A3 consists of three phases.
1. Implement a standard min-heap in Heap.java.
2. Implement a hash table in HashTable.java.
3. Augment your heap in Heap.java to implement efficient contains and changePriority
methods.
See the lecture slides from Lecture 17 for some additional context.
2 Getting Started
The Github Classroom invitation link for this assignment is in Assignment 3 on Canvas. Begin
by accepting the invitation and cloning a local working copy of your repository as you have for
past assignments.
The Heap class depends on the AList class you wrote for Lab 5. Copy your AList.java from
your lab 5 repository into your A3 repo’s src/main/java/heap/ directory.
3 Unit Tests and Gradle
AListTest.java, from Lab 5, has been included along with some small updates to catch a few
corner case bugs. If your implementation fails any of the AList tests, fix that first.
You are provided with unit tests for each phase in A3Test.java. The test methods are numbered
with three-digit numbers; the hundreds place indicates which phase the test pertains to. Test
often, and make sure you pass any tests associated with a TODO item before moving on to the
next one. You may find it helpful to run only the tests for the phase you’re currently working
on using the –tests flag with a wildcard; for example to run only the phase 2 tests, you could
run
gradle test –tests “test2*”
4 Phase 1
In Phase 1, you’ll complete the implementation of a min-heap given in the skeleton repository’s
Heap.java file. For details on how these operations work, see the slides from Lectures 13 and
14.
The tasks listed below are marked in Heap.java as TODO 1.0, TODO 1.1, and so on. Phase 3
involves further modifications to Heap.java. For now, ignore anything in the code
marked as Phase 3 Only, or TODO 3.x.
1.0 Read and understand the class invariant in the comment at the top of Heap.java. Ignore
the Phase 3 parts for now. This specifies the properties the Heap must satisfy. All public
methods are responsible for making sure that the class invariants are true before the
method returns.
1.1 Implement the add method, using the bubbleUp helper according to its specification.
1.2 Implement the swap helper method, which you’ll use in both bubbling routines.
1.3 Implement the bubbleUp helper method. Feel free to use private helper methods to keep
this code clean.
1.4 Implement peek. Recall that peek returns the minimum element without modifying the
heap.
1.5 Implement poll using bubbleDown, which you’ll implement next.
1.6 Implement bubbleDown. You are highly encouraged to use one or more helper methods to
keep this code clean. In fact, we’ve included a suggested private method smallerChild,
along with its specification, that we used in our solution.
5 Phase 2
In Phase 2 you’ll implement a hash table with chaining for collision resolution. Although we
could base it on our AList class, we need access to the internals of the growth process to
handle growing and rehashing as needed. Similarly, for the underlying storage we could use
an array of LinkedLists, but dealing with Java’s LinkedList machinery ends up being a bit of
a headache—more so than simply writing a little linked list code to do the chaining by hand.
For these reasons, you’ll complete a standalone hash table implementation in HashTable.java
without using any tools from java.util. The following major design decisions have been made
for you:
• The hash table encapsulates its key-value pairs in an inner class called Pair.
• The hash table uses chaining for collision resolution.
• The Pair class doubles as a linked list node, so it has fields to store its key, value and a
reference to the next Pair in the chain.
• The underlying array doubles in size when the load factor exceeds 0.8.
Here’s some sample code using the hash table and its output using my solution. The dump
method shows the internal layout of the table: each bucket in the buckets array stores a
reference to Pair objects, each of which stores a reference to the next Pair in the chain, or
null.
Code:
HashTable<Integer,Integer> hm = new HashTable<Integer,Integer>(4);
hm.put(0,0);
hm.put(4,1);
hm.put(19,1);
hm.put(19,4);
hm.dump()
Output:
Table size: 3 capacity: 4
0: –>(4, 1)–>(0, 0)–|
1: –|
2: –|
3: –>(19, 4)–|
Your job in Phase 2 is to implement four methods: (get, put, containsKey, and remove).
Details of how you implement them are left up to you – be sure to read the specification of each
method carefully to be sure you’re implementing the specified behavior correctly, completely,
and according to the given efficiency bounds.
Your tasks:
2.0 To get familiar with the underlying storage mechanism, read the existing code in HashTable.java.
In particular, read the javadoc comments above the class, the comments describing each
field and its purpose, and the Pair inner class. Then take a look at the provided dump
method, which may be useful for debugging.
2.1 Implement get(K key).
2.2 Implement put without worrying about load factor and array growth. Make sure that
you replace the value if the key is already in the map, and insert a new key-value pair
otherwise. After implementing get and put without growing the array, you should pass
test210PutGet, test211PutGet, test212Put, test213Put, test230PutGet, and test231Put.
2.3 Implement containsKey Your code should pass test240containsKey and test241containsKey.
2.4 Implement remove. Your code should pass test250Remove and test251Remove.
2.5 Finally, modify put to check the load factor and grow and rehash the array if the load
factor exceeds 0.8 after the insertion. Use the createBucketArray helper method to
create the new array. We’ve also included a method stub for a growIfNeeded private
helper method with a suggested specification; you are welcome to implement and use
this, but not required to. At this point your code should pass all Phase 2 tests. Note: If
you’re not careful, put can have worst-case O(n
2
) runtime.
6 Phase 3
In Phase 3, we turn back to the Heap class. Now that we have a working HashTable implementation, we can overlay the hash table on the heap in order to make the contains and
changePriority operations efficient. This is a common strategy when building data structures
in the real world: often a textbook data structure does not provide efficient runtimes for all
the operations you need, so you can combine multiple textbook data structures to get more
efficient operations at the cost of some extra bookkeeping and storage.
In the Phase 1 heap, finding a given value in the tree requires searching the whole tree, so the
runtime is O(n). In Phase 3, we’ll use a HashTable to map from values to heap indices,
which allows us to find any value in the heap in expected O(1) time, as we discussed in Lecture
17. Keeping the HashTable up to date requires small changes throughout the Heap class to
make sure that the HashTable stays consistent with the state of the heap – whenever the heap
is modified, we need to update the HashTable to match.
One constraint imposed by the use of a HashTable is that each value can only map to a single
index. This means that if we insert two entries with equal value into the table, we can’t
differentiate between them and store both indices HashTable. To deal with this, we will simply
add the requirement that all values stored in the Heap must be distinct. Note that two different
values may still have equal priorities.
Your tasks are as follows:
3.1 Update the add method to keep the map consistent with the state of the heap. Also be
sure to throw an exception if a duplicate value is inserted—the map makes it possible to
check this efficiently. For now, don’t worry about the map during bubbleUp—the next
TODO will handle this. At this point, your code should pass test300Add.
3.2 3.2 swap – update the swap method to keep the HashTable consistent with the heap.
If you used swap to implement both bubbling routines, bubbleUp and bubbleDown You
should now pass test310Swap and test315Add BubbleUp.
3.3 Update poll. Your code should now pass test330Poll BubbleDown NoDups and
test340testDuplicatePriorities.
3.4 Implement contains. Once again, the map makes this easy and efficient. You should
pass test350contains.
3.5 Implement changePriority by finding the value in question, updating its priority, and fixing the Heap property by bubbling it up or down. You should now pass test360ChangePriority.
At this point, your code should be correct! Check the following things before you submit:
1. Each method adheres to the asymptotic runtime given in its specification, if any.
2. Your code follows the style guidelines set out in the rubric and the syllabus.
3. Your submission compiles and passes all tests on the command line in Linux without
modification.
4. All code is committed and pushed to your A3 GitHub repository.
7 Extra Credit: Test Suite Gaps
This assignment has no optional enhancements – the base assignment will be worth all 50 points.
There is an opportunity for extra credit if you discover a gap in my test suite. Specifically, if
you discover that your implementation passes all the provided test cases (for AList or for any
phase of A3) but has a bug, you can earn up to 3 points of extra credit for providing a correct
test case that catches the bug.
If you find such a bug:
1. With the bug present in your solution code, Create a “testing” branch in your repository
(git checkout -b testing).
2. Modify the appropriate testing class (AListTest.java or A3Test.java) to include your additional test case. When this test file is run, your test should be the only one that fails.
3. Commit your test file changes and push your testing branch to github.
4. Email Scott (scott.wehrwein@wwu.edu) with a description of the bug and the test you’ve
written.
Do not wait until submission time—submit these whenever you encounter them. You can
switch back to your master branch, fix the bug in your solution, and continue working on the
project.
8 How and What to Submit
On the first line of hours.txt, give an integer estimate of the number of hours you spent on
this assignment. Below that, feel free to let me know how it went, anything you struggled with,
etc.
Submit the assignment by pushing your final changes to GitHub (git push origin master or
just git push) before the deadline.
Rubric
Points are awarded for correctness and efficiency of your program, and points can be deducted
for errors in commenting, style, clarity, and following assignment instructions. Correctness will
be judged based on the unit tests provided to you.
Git Repository
Code is pushed to github and hours spent appear as a lone integer in hours.txt 1 point
Code : Correctness
Each unit test is worth 1 point. A full report including any unit test failures will
be committed to your repository in the grading branch.
32
Code : Efficiency
Heap.add is average-case O( log n) and worst-case O(n), unless rehashing is
necessary in which case the runtime is expected O(C + n) and worst-case O(C +
n
2
), where C is the smaller array’s capacity.
2
Heap.peek is O(1) 1
Heap.poll is average-case O( log n) and worst-case O(n) 2
HashTable.get is average-case O(1), worst-case O(n) 2
HashTable.put is average-case O(1), worst-case O(n) 2
HashTable.containsKey is average-case O(1), worst-case O(n) 2
HashTable.remove is average-case O(1), worst-case O(n) 2
Heap.contains is average-case O(1) and worst-case O(n) 2
Heap.changePriority is average-case O( log n) and worst-case O(n) 2
Clarity deductions (up to 2 points each)
Include author, date and purpose n a comment comment at the top of each file
you write any code in
Methods you introduce should be accompanied by a precise specification
Non-obvious code sections should be explained in comments
Indentation should be consistent
Method should be written as concisely and clearly as possible
Methods should not be too long – use private helper methods
Code should not be cryptic and terse
Variable and function names should be informative
Total 50 points
CSCI241 Assignment 4
1 Overview
In A4, you will implement Dijkstra’s Single-Source Shortest Paths algorithm using a Graph
class provided to you. You will write methods in the ShortestPaths class to implement Dijkstra’s algorithm, reconstruct shortest paths after the algorithm has executed, and complete a
command-line program that computes shortest paths on a graph read from a text file.
As usual, please keep track of the hours you spend on this assignment and include them in your
submission as a lone integer in the provided hours.txt file.
2 Getting Started
The Github Classroom invitation link for this assignment is in Assignment 4 on Canvas. Begin
by accepting the invitation and cloning a local working copy of your repository as usual.
2.1 import heap.Heap
This assignment makes use of the priority queue you implemented in the Heap class in A3,
which in turn depends on your HashTable and AList classes. Build your A3 code and copy
the heap.jar file from your-a3-repo/build/libs/ into your-a4-repo/libs. Gradle will now
know to include this jarfile when compiling your A4 code, and you can use the Heap class by
importing it at the top of ShortestPaths.java: import heap.Heap.
If you are not confident in the correctness of your A3 solution, you may download our heap.jar
from Canvas and use that instead. Also if you’re encountering strange bugs in your shortest
paths algorithm, try plugging in the correct heap.jar to make sure the problem isn’t originating
in your A3 code.
3 Testing A4
Your grade will be partly based on unit tests, but this time around you have not been provided
with these tests. It is your responsibility to test your implementation and be certain
that the algorithm is implemented correctly. ShortestPathsTest.java contains a placeholder test case to get you started writing your own tests.
You will lose a few points if you do
not write at least a few unit tests in ShortestPathsTest.java, although we will not be grading
your test cases for correctness or comprehensiveness.
Some testing advice:
• You should test your code both using JUnit test cases and the command line program
implemented in the main method.
• The two graphs given in the in-class exercise from the first Dijkstra lecture (Simple1.txt
and Simple2.txt) are provided in src/test/resources/ directory. Run the algorithm by hand
to determine the correct answers for these graphs and verify that your implementation
arrives at the correct paths and path lengths. Keep in mind that if algorithm works on
these simple graphs, it does not necessarily work on all graphs.
• You can write unit tests either by constructing Graph objects from scratch in code, or
by parsing graphs from files in the src/test/resources directory. Opening these files
at test time requires a little bit of gradle trickery, but we’ve included some example
code and a helper method in ShortestPathsTest.java that finds the path for a given
filename and uses ShortestPaths.parseGraph to read a graph from a file located in
src/test/resources.
• Make sure your algorithm handles edge cases correctly, including behaving as specified
when the destination node is unreachable. Test this using the simplest possible test cases
(for example, this edge case could be tested using a two-node graph with only an edge
from destination to origin).
• The BasicParser class parses a simple edge list from a text file, such as Simple1.txt,
Simple2.txt, and FakeCanada.txt. Feel free to write and test using additional graph files
in this format.
• See the information in data/db1b info.txt for how to download a larger, more real-world
dataset of flight information and run your algorithm on it.
4 Implementing The Algorithm
The abstract version of Dijkstra’s algorithm presented in lecture is as follows:
/** compute shortest paths to all ndoes from
* origin node v */
shortest_paths(v):
S = { };
F = {v};
v.d = 0;
v.bp = null;
while (F != {}) {
f = node in F with min d value;
Remove f from F, add it to S;
for each neighbor w of f {
if (w not in S or F) {
w.d = f.d + weight(f, w);
w.bp = f;
add w to F;
} else if (f.d + weight(f,w) < w.d) {
w.d = f.d + weight(f,w);
w.bp = f;
}
}
}
Your implementation should follow this abstract algorithm as closely as possible, but the graph
representation won’t match the pseudocode exactly because we have to deal with practical
implementation considerations.
The following design decisions have been made for you:
• So that we can efficiently find the node in F with minimum d value, the Frontier set is
stored in a min-heap, using d-values as priorities. Because a Node’s d-value may change,
you will need to use the Heap’s changePriority method to keep the priorities updated.
• Instead of the Node class having a field for d and bp, we store these things separately. The
ShortestPaths class maintains a Map that associates each Node with a PathData object,
which stores the distance and backpointer for a node.
• The Settled set does not need to be explicitly stored. If a Node has a PathData object
associated with it, it is in either the Settled (not in the heap) or Frontier set (in the heap);
otherwise it is in the Unexplored set.
5 Main Method Behavior
The main method behavior is specified in the descriptions below and in comments associated
with each TODO item. In brief, the program takes 3 or 4 command-line arguments. The first
two specify the file type (basic or db1b) and the filename where the graph data is stored. The
third is an origin node id, from which shortest paths should be computed.
If no fourth argument
is supplied, the program should print all reachable nodes and their shortest path distances. If
the fourth argument is supplied, it gives a destination node, and the program should print in
order the nodes along the path from the origin to the destination, followed by the length of the
path.
Two sample invocations of the program are given below:
$ gradle run –args “basic src/test/resources/Simple0.txt A”
Graph has:
3 nodes.
3 edges.
Average degree 1.0
Shortest paths from A:
B: 1.0
C: 2.0
A: 0.0
$ gradle run –args “basic src/test/resources/Simple0.txt A C”
Graph has:
3 nodes.
3 edges.
Average degree 1.0
A C 2.0
6 Your Tasks
You will be implementing Dijkstra’s algorithm, writing helper methods to reconstruct paths and
fetch path lengths, and finishing the main method, all in ShortestPaths.java. As in A3, there
are not very many lines of code to write, but you cannot write them without first understanding
the algorithm and the codebase you’re working in.
0. Begin by looking over the early slides of Lecture 19, where the implementation details are
covered. Carefully read and understand the /** Javadoc comments */ in Graph.java,
Node.java, and ShortestPaths.java. Read over the code in these files as well. When you’re
done, you should be able to answer questions such as:
(a) What is the purpose of each of the following HashMaps?
• Graph’s nodes field
• Node’s neighbors field
• ShortestPath’s paths field
(b) Where is the Graph’s adjacency list stored, how would you iterate over all edges
leaving a given Node, and how would you get the weight of each edge?
(c) What are types of the Values and Priorities in the min-heap storing F?
(d) For a given Node object, where are n.d and n.bp stored?
1. Implement the compute method in the ShortestPaths class according to its specification.
2. Implement the shortestPathLength method. Notice that this method’s precondition
states that compute has been called with the desired origin node, so the paths field
should already be filled in with the final shortest paths data.
3. Implement the shortestPath method. Once again, compute has already been called with
the desired origin so you only need to use the data stored in paths to reconstruct the
path.
4. In the main method, create and use an instance of ShortestPaths to compute shortestpaths data from the origin node specified in the command line arguments.
5. If a destination node was not specified on the command line, print a table of reachable
nodes and their shortest path lengths.
6. If a destination node was specified on the command line, print the nodes in the shortest
path from the origin to the destination, followed by the length of the path.
7 Enhancements
You’ll have noticed based on ShortestPaths’ main method that the codebase is able to read
graphs from two types of text files, denoted basic and db1b. When a db1b file is given, the
DB1BParser class parses a Graph from a csv file with data showing actual flights, their origin
and destination, and the number of miles flown. The DB1BCoupon dataset I used is available
at https://www.transtats.bts.gov/Tables.asp?DB_ID=125.
When I started making plans to see my folks in Vermont for winter break, I wondered about
the shortest path from Bellingham to Burlington, Vermont. I downloaded flight data from
the Bureau of Transporation Statistics for the first quarter of 2018 and ran the following command:
% gradle run –args “db1b data/DB1BCoupon_2018_1.csv BLI BTV”
BLI GEG MSP BUF BTV 2454.0
It turns out the shortest route to Vermont, in terms of miles, involves flying from Bellingham
to Spokane to Minneapolis to Buffalo to Burlington, a total of 4 flights! Surely that’s not the
best way to fly there.
In air travel, it’s rarely the case that more flights get you there faster than fewer flights—you
usually want to get there with the fewest hops possible. For 5 points of enhancement credit,
create an enhancements branch and extend your ShortestPaths class to also compute the route
between a given origin and destination with the fewest flights, regardless of the total number
of miles flown. For full credit, compute both the path itself and the number of edges in the
path.
How you design your solution is up to you, but you should keep your modifications in ShortestPaths.java and make sure they’re in a separate enhancements branch. Write a detailed
comment at the top of ShortestPaths.java explaining
1. How to use your program (try to keep it similar to the behavior of the base assignment).
2. How you solved the problem and any design decisions you made, including any relation
your solution has to algorithms we’ve discussed in class.
8 How and What to Submit
Check the following things before you submit:
1. Your code follows the style guidelines set out in the rubric and the syllabus.
2. Your submission compiles and runs on the command line in Linux without modification.
3. hours.txt contains a lone integer with the estimated number of hours you spent on this
assignment.
4. All code, including unit tests, is committed and pushed to your A4 GitHub repository.
Submit the assignment by pushing your final changes to GitHub (git push origin master or
just git push) before the deadline.
Rubric
You can earn points for the correctness and efficiency of your program, and points can be
deducted for errors in commenting, style, clarity, and following assignment instructions. Correctness will be judged on both unit tests on your ShortestPaths class as well as the correct
behavior exhibited by the program implemented in the main method.
Git Repository
Code is pushed to github and hours spent appear as a lone integer in hours.txt 1 point
Code : Correctness
Unit tests of methods in ShortestPaths 21
Correct behavior of the ShortestPaths main method program with no destination
specified
Correct behavior of the ShortestPaths main method program with a destination
specified
Code : Unit Tests
At least a few test test are written in ShortestPathsTest.java 3
Code : Efficiency
ShortestPaths.compute uses a Heap to find the node to move from F to S in
O(log n).
5
The body of the for loop in ShortestPaths.compute runs in O(log n) expected
time.
Enhancements
Correctly computes the fewest number of hops 3
Correctly computes a path with the fewest hops 2
Clarity deductions (up to 2 points each)
Include author, date and purpose in a comment comment at the top of each file
you write any code in
Methods you introduce should be accompanied by a precise specification
Non-obvious code sections should be explained in comments
Indentation should be consistent
Method should be written as concisely and clearly as possible
Methods should not be too long – use private helper methods
Code should not be cryptic and terse
Variable and function names should be informative
Total 50 points