Description
String Searching
For this assignment you will be coding 3 different string searching algorithms: Boyer-Moore, KnuthMorris-Pratt,
and Rabin-Karp. There is information about all three in the interface and more information
about Boyer-Moore and KMP in the book (also under resources on T-Square). If you implement any of
the three algorithms in an unexpected manner (i.e. contrary to what the Javadocs and PDF specify),
you may receive a 0.
For all of the search algorithms, make sure you check the simple failure cases as soon as possible. For
example, if the pattern is longer than the text, don’t do any preprocessing on the pattern/text.
Do not use Math.pow in any method for this assignment.
Knuth-Morris-Pratt
Failure Table Construction
The Knuth-Morris-Pratt (KMP) algorithm relies on using the prefix of the pattern to determine how
much to shift the pattern by. The algorithm itself uses what is known as the failure table (also called
failure function). There are different ways of calculating the failure table, but we are expecting one
specific format described below.
For any string pattern, have a pointer i starting at the first letter, a pointer j starting at the second
letter, a table called table that is the length of the pattern. Then, while j is still a valid index within
pattern:
• If the characters pointed to by i and j match, then write i + 1 to index j of the table and
increment i and j.
• If the characters pointed to by i and j do not match:
1
Homework 9: String Searching Due: See T-Square
– If i is not at 0, then change i to table[i – 1]. Do not increment j or write any value to
the table.
– If i is at 0, then write i to index j of the table. Increment only j.
For example, for the string abacab, the failure table will be:
a b a c a b
0 0 1 0 1 2
For the string ababac, the failure table will be:
a b a b a c
0 0 1 2 3 0
For the string abaababa, the failure table will be:
a b a a b a b a
0 0 1 1 2 3 2 3
For the string aaaaaa, the failure table will be:
a a a a a a
0 1 2 3 4 5
Searching Algorithm
For the main searching algorithm, the search acts like a standard brute-force search for the most part,
but in the case of a mismatch:
• If the mismatch occurs at index 0 of the pattern, then shift the pattern by 1.
• If the mismatch occurs at index j of the pattern and index i of the text, then shift the pattern
such that index failure[j-1] of the pattern lines up with index i of the text, where failure is
the failure table. Then, continue the comparisons at index i of the text (or index failure[j-1]
of the pattern). Do not restart at index 0 of the pattern.
In addition, when a match is found, instead of shifting the pattern over by 1 to continue searching
for more matches, the pattern should be shifted over by failure[j-1], where j is the last index of the
pattern.
CharSequence
CharSequence is an interface that is implemented by String, StringBuffer, StringBuilder and many
others. We have also included a class, SearchableString, that implements CharSequence. You may
use any class that implements CharSequence while testing your code. SearchableString allows you
to see how many times you have called charAt(). We will be looking at the number of times you call
charAt() while grading.
Do not use any method except charAt() and length(); all other methods will either throw an
exception or will return invalid data. In addition, do not attempt to circumvent the retrictions we placed
in the SearchableString class.
2
Homework 9: String Searching Due: See T-Square
A note on JUnits
We have provided a very basic set of tests for your code, in StringSearchingStudentTests.java.
These tests do not guarantee the correctness of your code (by any measure), nor does it guarantee you
any grade. You may additionally post your own set of tests for others to use on the Georgia Tech GitHub
as a gist. Do NOT post your tests on the public GitHub. There will be a link to the Georgia Tech
GitHub as well as a list of JUnits other students have posted on the class Piazza.
If you need help on running JUnits, there is a guide, available on T-Square under Resources, to help
you run JUnits on the command line or in IntelliJ.
Style and Formatting
It is important that your code is not only functional but is also written clearly and with good style. We
will be checking your code against a style checker that we are providing. It is located in T-Square, under
Resources, along with instructions on how to use it. We will take off a point for every style error that
occurs. If you feel like what you wrote is in accordance with good style but still sets off the style checker
please email Carey MacDonald (careyjmac@gatech.edu) with the subject header of “CheckStyle XML”.
Javadocs
Javadoc any helper methods you create in a style similar to the existing Javadocs. Like the existing
Javadocs, the Javadocs for your helper method(s) must describe well what the method does, what
each parameter means (if any), and what the returned value is (if any). If a method is overridden or
implemented from a superclass or an interface, you may use @Override instead of writing Javadocs.
Exceptions
When throwing exceptions, you must include a message by passing in a String as a parameter. The message
must be useful and tell the user what went wrong. “Error”, “BAD THING HAPPENED”,
and “fail” are not good messages. The name of the exception itself is not a good message.
For example:
throw new PDFReadException(“Did not read PDF, will lose points.”);
throw new IllegalArgumentException(“Cannot insert null data into data structure.”);
Generics
If available, use the generic type of the class; do not use the raw type of the class. For example, use new
LinkedList<Integer() instead of new LinkedList(). Using the raw type of the class will result in a
penalty.
Forbidden Statements
You may not use these in your code at any time in CS 1332.
• break may only be used in switch-case statements
• continue
• package
• System.arraycopy()
3
Homework 9: String Searching Due: See T-Square
• clone()
• assert()
• Arrays class
• Array class
• Objects class
• Stack class
• Collections class
• Collection.toArray()
• Reflection APIs
• Inner, nested, or anonymous classes
• Math.pow() (for this homework only)
Debug print statements are fine, but nothing should be printed when we run them. We expect clean
runs – printing to the console when we’re grading will result in a penalty. If you use these, we will take
off points.
Provided
The following file(s) have been provided to you. There are several, but you will edit only one of them.
1. StringSearching.java
This is the class in which you will implement the different string searching algorithms. Feel free
to add private static helper methods but do not add any new public methods, new classes,
instance variables, or static variables.
2. StringSearchingStudentTests.java
This is the test class that contains a set of tests covering the basic operations on the StringSearching
class. It is not intended to be exhaustive and does not guarantee any type of grade. Write your
own tests to ensure you cover all edge cases.
3. SearchableString.java
This is a wrapper class around a String object. It counts the number of times charAt() has been
called and disables some other unnecessary operations. Do not modify this file.
Deliverables
You must submit all of the following file(s). Please make sure the filename matches the filename(s)
below, and that only the following file(s) are present. T-Square does not delete files from old uploads;
you must do this manually. Failure to do so may result in a penalty.
After submitting, be sure you receive the confirmation email from T-Square, and then download your
uploaded files to a new folder, copy over the interfaces, recompile, and run. It is your responsibility to
re-test your submission and discover editing oddities, upload issues, etc.
1. StringSearching.java