Description
1 Overview
This project asks you to evaluate three sorting algorithms we have discussed
in class on various inputs and relate their theoretical run time to their empirical (actual) behavior. These three algorithms are InsertionSort, MergeSort,
and QuickSort. In the interests of time, you have been provided with code
that implements all three algorithms and can run them on inputs of varying
sizes and types.
2 Running the provided program
Along with this document, you have been provided with source code and
a Makefile for the various sorting algorithms and input types to test in this
project. This program will run the specified algorithm (InsertionSort, MergeSort, or QuickSort) on a sorted, random, or constant input array of a given
size. I expect that you can use an IDE or the Linux make utility in order
to compile the code. Once compiled, the program should be run from the
command line, as it expects three arguments. If fewer than three arguments
are specified, the program will assume default variables for any unspecified
arguments.
The first of these three arguments represents the size of the input array. The program only accepts sizes between 1 and 1,000,000,000, inclusive
(default 10,000). The second argument represents the sorting algorithm you
wish to run. Valid algorithms include InsertionSort, MergeSort, and QuickSort (default MergeSort), or you can abbreviate the algorithms by their first
letter (‘i,’ ‘m,’, or ‘q’). The last argument represents the type of input to
1
sort. Valid input types are sorted, random, and constant (or ‘s,’ ‘r’, ‘c’; default ‘r’), where ‘random’ is an unsorted array, ‘sorted’ is a sorted array (in
increasing order), and ‘constant’ is an array where every entry is identical.
In order to improve the timing stability, the algorithm runs the requested
sort three times and reports the median of the three timing results to you.
In order to get the most accurate timing results, there should be no other
processes running on the machine at the same time, but this may not always
be possible, especially if you are running the program on a lab machine.
3 Project report
Your project report should be divided into two parts, Results and Analysis.
For the Results section, you will need to prepare a data file describing the
performance of your algorithms, and in the Analysis portion, you will need
to prepare a table describing their complexity, as well as a response to these
results.
3.1 Results
Run each of the three sorting algorithms on constant, sorted, and random
arrays that are powers of 10. For each of the twelve cases, you should record
the following:
1. nmin: the smallest power of 10 that takes 20 milliseconds or more to
run;
2. tmin: the time to sort an array of size nmin;
3. nmax: the largest power of 10 that takes 10 minutes (600 seconds) or
less to run, or 1,000,000,000 if no input took more than 10 minutes;
4. tmax: the time required to sort nmax elements.
Note, you may need to ensure that you have sufficient stack space before
testing QuickSort to ensure that you do not run out (“stack overflow”). You
can run ulimit -s unlimited in Linux before executing the algorithm to
increase the available stack space. This parameter can also be changed in
many IDEs on Mac and Windows; consult the appropriate documentation if
this becomes a problem.
2
If you are having trouble getting good timing results (e.g., one power-often gives good timing results, but the next is too long and the previous too
quick), you may try using other input sizes, such as 3.2 · 10x
, 1.8 · 10x
, or
5.6·10x
. You may also use different systems to perform the tests for different
algorithm/input combinations, but obviously, you’ll want all of the runs for a
single experiment to be on the same machine, as a timing ratio from different
machines is virtually meaningless.
You should enter your results into a comma-separated value (CSV) file.
The CSV file should contain 5 columns and 10 rows. Your first column
should label the 9 different experiments, while the first row label the experiment variables (nmin, tmin, nmax, and tmax). Your row labels should include
the algorithm name (InsertionSort, MergeSort, or Quicksort) and input type
(Sorted, Random, or Constant). You may abbreviate these labels as I, M, Q,
and S, R, C. (For example, MS represents your MergeSort result on a sorted
array.) An example table appears below. You may use Excel (or any other
software) to prepare your data.
nmin tmin nmax tmax
IC
IS
IR
MC
MS
MR
QC
QS
QR
3.2 Analysis
In this section, you will estimate the complexity of the four algorithms by
comparing the ratio between tmin and tmax to ratios representing the complexity of the algorithm. Specifically, you should compute f(nmin)/f(nmax)
for f1(n) = n, f2(n) = n ln(n), and f3(n) = n
2
. You should round to the
nearest integer when computing these ratios. Finally, you should label each
experiment according to the ratio tmax/tmin most resembles.
For example, if one of your experiments resulting in nmin = 100 and
nmax = 10, 000, 000, your ratios would be f1(nmax)/f1(nmin) = 100, 000,
f2(nmax)/f2(nmin) = 350, 000, and f3(nmax)/f3(nmin) = 10, 000, 000, 000. You
would then label the algorithm based on which of these three ratios tmax/tmin
is most closest to.
3
As part of your report, you should create a chart that includes the computed ratios as well as the behavior of the algorithm (Linear, n lg n, or
Quadratic), across all 9 experiments. An example chart appears below:
tmax/tmin f1 ratio f2 ratio f3 ratio Behavior
IC
IS
IR
MC
MS
MR
QC
QS
QR
You should then write a summary of (1) how your results compare to
the theoretical analysis for the three algorithms (below), and (2) why your
results make sense or are surprising. You should spend more time explaining
your results when they are unusual or unexpected.
Best-case Average-case Worst-case
complexity complexity complexity
InsertionSort Ω(n) Θ(n
2
) O(n
2
)
MergeSort Θ(n lg n) Θ(n lg n) Θ(n lg n)
QuickSort Ω(n lg n) Θ(n lg n) O(n
2
)
4 Submission
For this project, you should submit a zip archive containing (1) a CSV file
containing your results (described in Section 3.1), and (2) your tables and
analysis (described in Section 3.2), in PDF format.
Note: This is an individual project. You are not allowed to submit work
that has been pulled from the Internet, nor work that has been done by your
peers. Your submitted materials will be analyzed for plagiarism. Project 1
will be evaluated out of 50 points:
4
5 Grading
Data file containing results 15 points
Table with ratios 15 points
Analysis 20 points
Requirements for each portion of the grade are described in Sections 3.1
and 3.2.
5