Description
Objectives
Upon completion of this assignment, you need to be able to: • Iterate through an array from its beginning index to the end. • Determine cumulative information as you progress through an array. • Determine equality of string objects. • Access the individual characters of a string object in Java. • Continue building good programming skills. Introduction
An important task in bioinformatics is the identification of DNA and RNA sequences. In this assignment, we will be looking at nucleic acid sequences. These sequences contain uptofourdifferentbasesdenotedbyletters: A foradenine, C forcytosine, G forguanine, and T for thymine. Sequence strings are compared in order to determine whether nucleic acid sequences match each other, or are related through mutations. Real sequence data, as used by biochemistsandin bioinformaticsresearch,consistsofverylongstrings ofbases. Determiningrelatednesscanrequiretheuseofverycomplexalgorithms,beyondthescope of this assignment. The sequences in this assignment will all contain between two and four of the possible bases in {A,C,G,T}. Your task to to search through a collection of sequence data and count how many times a specific sequence occurs. For example, if the collection contains thefollowingsequences:{ACTG,GATC,ACT,GTC,AC,GATC,GA}andwesearch for the specific sequence GATC, we would report that it was found 2 times. One of the challenges in this assignment will be dealing with mutated sequences. A mutation can occur due to insertions of additional bases within a sequence. For the purpose
of this assignment, a mutated sequence contains at least two of the same bases occurring in a row; for example, in the sequence GAAATC, the A has mutated, and in the sequence CCGGAT, both the C and G have mutated. Another task in this assignment is to detect how many of the sequences in the collection are mutated. The final task will be to search through the collection of sequence data for a specific sequence, but you must treat original and mutated sequences the same. For example, if the collection contains {TGC,AC,TTGC,TACG,TGGCC,AGTC}and we search for the specific sequence TGC, we would report that it was found 3 times, because TTGC and TGGCC are mutated forms of TGC..
QuickStart
(1) Download this pdf file and store it on your computer. (2) Create a directory / folder on your computer specific to this assignment. for example, CSC110/assn5 is a good name. (3) Save a file called SearchDNA.java in your directory. Fill the source code with the class name and the methods outlined in the following specification document. (4) Start with the easiest methods first. See the following detailed instructions below for some tips. (5) Complete each method and test it thoroughly before moving onto the next one. Compile and run the program frequently; it is much easier to debug an error shortly after a successful and error-free run.
DetailedInstructions • Start with the printArray method. Focus on passing in an array of Strings as a parameterandusingalooptovisiteachelementinthearray.Rememberthatarrayindices start at 0, as do the indices of each character in a String. • AftercompletingandtestingprintArray,workonthefindLongestorthefindFrequency methods,astheyarequitesimilar.Withinbothmethods,usealooptovisiteachelement in the array. In the findLongest method, you must keep track of which String object in the array contains the most characters, whereas in the findFrequency method, you mustkeeptrackofhowmanytimesaspecific String objectisfoundinthearray.Make sure you finish and test both of these before moving to the next step.
• The methods involving mutations are a little more difficult. In this assignment, a mutation occurs when two or more characters in a string are repeated in a row. Think about howyoumightbeabletodetectamutationina String object.Onceyoucomeupwith a strategy, test it with a number of Strings to see if it works! • Asyouworkthroughasolution,westronglyrecommendthatyousave,compileandtest the code after every line or two. This can be something as easy as printing out the value of a variable, or calling a method to print out the value returned. It is important to do this to confirm a component of the code works correctly, so you can be confident using that component throughout the code in later steps. • For each of the methods listed in the specifications, you must provide an internal test call from the main method If the method does not behave as expected, then debug and adjustthemethod.Toreceivefullmarksfortesting,eachmethodmustbetested,evenif it does not work.
Examples
We provide a couple of internal test cases you can use to test the correctness of the methods. The following shows the method call to printArray from inside the main method the output that is expected from a working printArray method.
Submission
Submit the following completed file to the Assignment folder on conneX.
• SearchDNA.java Pleasemakesureyouhavesubmittedtherequiredfile(s)andconneXhassentyouaconfirmationemail.Donotsend[.class](thebytecode)files.Also,makesureyousubmityour assignment, not just save a draft. Draft copies are not made available to the instructors, so they are not collected with the other submissions. We can find your draft submission, but only if we know that it’s there.
Anoteaboutacademicintegrity
It is OK to talk about your assignment with your classmates, and you are encouraged to design solutions together, but each student must implement their own solution.
Grading
Marks are allocated for … • No errors during compilation of the source code. • The method headers must be exactly as specified and the methods must perform as specified. Be sure to read theMethodDetailsand not just the summaries. • Each method must have a test call inside the main method. • Style of the source code meets the requirements outlined in the Style Guidelines document available in the Resources folder of conneX.