Read in from the user: 1) a query String (which must consist of characters A, C, T, G representing a sequence of nucleotides), 2) the name of an input file containing a potentially long sequence of nucleotides (characters A, C, T, G) representing the DNA of some organism, and 3) a threshold value that defines a successful match (as described below).
The query String should be validated and its length (n) computed. Store the DNA sequence in an ArrayList, where the element at position i contains a word of length n starting at position i of the DNA sequence. For example, if the DNA sequence in the input file is "ACTGGCCTA" and the query String is "CTG" (a String of length 3), then the list will look like this: <"ACT", "CTG", "TGG", "GGC", "GCC", "CCT", "CTA"> representing all of the n-letter sequences in the DNA file.
Compare the query String with each substring in the list, computing a score for each comparison. Assign a value of +1 to two identical characters (in the same position) and a score of -1 to mismatched characters. Assuming the above example, the scores would be as follows.
Report the results: all "matches" that meet or surpass the user-specified threshold, in sorted order from highest to lowest score. Indicate also the position of the matched substring within the DNA sequence. Assuming the above example, here is a sample output:
Query String: CTG DNA file name: ecoli.txt Threshold: +1 DNA substring Position Score ------------- -------- ----- CTG 1 +3 CTA 6 +1
Write the list containing DNA substrings to a binary file and the next time the program runs, give the user the option of reading the info from this file directly into the list rather than starting from scratch with the text file containing ASCII characters. This feature is especially useful if the DNA sequence is very long and the user wants to run the program multiple times on the same DNA sequence. Refer to pages 289-290 of the third edition (or pages 269-270 of the second edition or pages 201-202 of the updated edition) to see how to write an object to a file and how to read an object from a file.
References [1] Cutter, Pamela. "Having a BLAST: A Bioinformatics Project in CS2," SIGCSE 2007. [2] http://en.wikipedia.org/wiki/BLAST [3] http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/information3.html [4] http://130.14.29.110/BLAST/