Name: COL106 Assignment 6 The substring matching problem solved
SKU: 4244
Price: 29.99 USD
Availability: InStock

Description

5/5 - (1 vote)

In this assignment we will address a basic problem in text processing: the substring matching problem. First note that a string is an array of characters. Now, given two strings, a of length n and b of length m we say that a is a substring of b if there is an i : 0 ≤ i ≤ m−n such that a[j] = b[i + j] for all j : 0 ≤ j ≤ n−1. For example “at” is a substring of “cat”, “bat”, and “attendance” but not of “basin”. Given a set of strings s and a query string q, the substring matching problem asks us to answer questions like • Is q a substring of s? • How many times does q occur as a substring of s? (The occurences may be overlapping or non-overlapping). • At what position(s) of s does q occur as a substring? • Does q0 that diﬀers by q in only k characters occur as a substring of s? (Example: the string “tag” is not a substring of the string “category” but it has only 1 character that diﬀers with “teg” which is a substring.)
Important: For the purposes of this assignment we will do case-insensitive matching, i.e., we will treat upper and lower case letters as the same. Sort-of important: This assignment involves a signiﬁcant amount of selfstudy since you are required to code a solution to the substring matching problem using suﬃx tries, an eﬃcient data structure for the problem, and this data structure will not be covered in class.
1
Very important: There are many implementations of suﬃx tries and associated algorithms available on the Web. Please do not copy them. We will randomly check for copying using turnitin and if we ﬁnd signiﬁcant matches with the database then we will take strict disciplinary action. The minimum punishment will be an F grade in the class. You are to write all code yourself.
Assignment 6 [100 marks] Deadline: 11:55 PM, 1 November 2016
You are expected to do the following
• Read and understand the trie data structure from the textbook or any other source. • Read and understand suﬃx tries from any source. Sometimes you will ﬁnd these called suﬃx trees. • Write a Java class named SuffixTrie for a suﬃx trie containing all the methods you will need. Note that we are not specifying method names or implementation details this time, you have to work them out yourself. Although, include performAction(String actionMessage) method in the SuffixTrie.java as checker ﬁle calls it. • Read in a ﬁle that contains the target string s and creates a suﬃx trie out of this. Note that there is a standard algorithm for this that creates a suﬃx trie in θ(n) time where n is the length of s. Read this algorithm and implement it. • Handle the four kinds of queries mentioned above. Demo Instructions: In this assignment, you have been given ﬁles: checker.java, actions.txt, input.txt and output.txt. checker.java ﬁle is our program which reads the input actions from the actions.txt ﬁle, and feeds them to SuﬃxTrie.java. The output.txt ﬁle contains output corresponding to the actions.txt. The input string for making suﬃx trie is provided in the ﬁle named input.txt. The primary task of your assignment is to give the correct answer to the query messages (described below). We will verify the output of the program during the demo. The list of actions that you will be expected to handle:
2
• makeSuffixTrie filename: reads in a single string from the ﬁle whose name is speciﬁed and creates a suﬃx trie. Everything from the beginning of the ﬁle to the end of the ﬁle will be considered as part of the string including any special characters or spaces or line breaks. • isSubstring s: Returns 1 if s is a substring of the string stored in the suﬃx trie, 0 if not and throws an exception if the suﬃx trie is empty (i.e. has not been built yet.) • numSubstrings s overlapflag: Returns the number of copies of s can be found in the string stored in the suﬃx trie, 0 if s is not a substring, and throws an exception if the suﬃx trie is empty (i.e. has not been built yet.) If overlapflag is 1 then overlapping occurences are treated as diﬀerent instances of the string, otherwise not, i.e., if our string is “banana” and the query string is “ana” then if overlapflag is 1 your program should output 2 because “ana” is found starting from locations 1 and 3 (the ﬁrst location is 0). If overlapflag is 0 then the output is 1. • posSubstrings s overlapflag: Returns a list of locations at which a copy of s can be found in the string stored in the suﬃx trie, -1 if s is not a substring, and throws an exception if the suﬃx trie is empty (i.e. has not been built yet.) If overlapflag is 1 then overlapping occurences are treated as diﬀerent instances of the string, otherwise not, i.e., if our string is “banana” and the query string is “ana” then if overlapflag is 1 your program should output 1 3 indicating that “ana” is found starting from locations 1 and 3 (the ﬁrst location is 0). If overlapflag is 0 then the output is only 1. • numFuzzySubstrings s num overlapflag: Returns the number of locations where a copy of a string that diﬀers from s in at most num locations is to be found in the string stored in the suﬃx trie, -1 if no such substring exists. Each unique substring matched with atmost num diﬀerence should be counted once. If num is greater than the length of s then an exception must be raised. An exception must also be raised if the suﬃx trie is empty. The role of overlapflag is as mentioned above. • posFuzzySubstrings s num overlapflag: Returns a list of locations where a copy of a string that diﬀers from s in at most num locations is
3
to be found in the string stored in the suﬃx trie, -1 if no such substring exists. If num is greater than the length of s then an exception must be raised. An exception must also be raised if the suﬃx trie is empty. The role of overlapflag is as mentioned above

COL106 Assignment 6 The substring matching problem solved

Description

Related products

COL106 Assignment 4-5 A small search engine solved

COL106 Assignment 7 A taxi aggregator service solved

COL106 Assignments 2 and 3 Mobile phone tracking system solved