C 343 Project 4 – DNA Sequence Alignment solution




5/5 - (5 votes)

1 Assignment Description
Figure 1: Alignment Example
Sequence Alignment is an important technique in
understanding how similar two DNA sequences
are. Applications of DNA sequence alignment range
from determining gene function to finding similarity between species. Informally, alignment can be
understood as writing the two sequences in rows,
where the two characters in the same column are
said to be aligned. If the two aligned characters are
the same, we have a match; if the two characters
are not the same, we have a mismatch. To try and maximize the number of matches, we can
also insert gaps in either sequence. Mismatches can be interpreted as point mutations and
gaps as insertion or deletion mutations. We disallow columns that consist of gaps only.
Figure 2: Dot Board
There can be many alignments of two sequences,
but the scores assigned to matches, mismatches,
gaps determine which are the best alignments. An
optimal solution can be found using a dynamic programming approach. Dynamic programming algorithms can be visualized by a board whose cells keep
track of the best solution till that point. In bioinformatics, these are called dot boards.
The algorithms fills in the cells of this board
with the best scores for every possible prefix of the
2 Your Task
Your task is to complete the seq align function in seq align.py. This function takes three
• s1 – the fist sequence.
• s2 – the second sequence.
• enable graphics – shows the dot board if this option is True.
• The function returns a pair of sequences representing the alignment.
• The score function named s is given to you. It takes as input two characters (from the
same column) and returns the score.
• Suppose that your algorithm returns the sequences s
and s
of length n. The total
score is computed as follows:
[k], s0
Your algorithm must find the alignment (the s
and s
) that maximizes the total score.
You need to setup the recurrence equations for this dynamic programming algorithm,
initialize the recurrence, compute the scores using the recurrence equations and finally trace
the solution to produce the two aligned sequences. All of this is done in the seq align
If you’d like to see the board getting filled in, you can use the provided render board
function as follows inside of your seq align function

if enable_graphics:
render_board(screen, font, s1, s2, F)

Here is an example of the completed dot board for aligning X = AC and Y = CA. Here
the SPACE PENALTY is −1, a match is 2, and a mismatch is −2. To record our choice in each
cell, we write D for delete (a gap in Y ), I for insert (a gap in X), and M for either match or
i=0 1 2
j=0 0 D=-1 D=-2
1 C I=-1 M=-2 M=0
2 A I=-2 M=0 D=-1
Therefore, a best alignment (not unique) for AC and CA is ( AC, CA ).
3 Deliverables
Your repo’s project4 folder should contain all the files from the zip. These are the ones you
need to modify:
• seq align.py – containing your solution and tests
• README.md – where you need to explain your code
After that you can issue a pull request.
4 Testing
We have provided you with the seq align.py file where you have to implement the seq align
function. There are also tests in tests.txt file which you can run by executing run
seq align.py test. We also encourage you to write your own test cases. A good way to
do this is to use assert statements. So if your function must return True for given inputs,
the test case looks like
assert f(i) == True
You can then put all your tests inside an if statement.
if __name__ == “__main__”:
# my unit tests
The tests will be executed when you execute this python script but not when you
include this file into another script and run that. For more information look at http:
5 Running Your Code
As always, you should be able to execute the script in three modes:
• interactive – invoked by executing python seq align.py
• test – invoked by executing python seq align.py test. You can run the provided tests in
this mode.
• gentests – invoked by executing python seq align.py gentests. This mode helps you
generate new tests.