Ko Pang, Narayanan Mahesh, Kalyanaraman Anantharaman, Aluru Srinivas
Department of Electrical and Computer Engineering, Iowa State University, USA.
Proc IEEE Comput Syst Bioinform Conf. 2004:80-8. doi: 10.1109/csb.2004.1332420.
DNA-protein alignment algorithms can be used to discover coding sequences in a genomic sequence, if the corresponding protein derivatives are known. They can also be used to identify potential coding sequences of a newly sequenced genome, by using proteins from related species. Previously known algorithms either solve a simplified formulation, or sacrifice optimality to achieve practical implementation. In this paper, we present a comprehensive formulation of the DNA-protein alignment problem, and an algorithm to compute the optimal alignment in O(mn) time using only four tables of size (m + 1) x (n + 1), where m and n are the lengths of the DNA and protein sequences, respectively. We also developed a Protein and DNA Alignment program PanDA that implements the proposed solution. Experimental results indicate that our algorithm produces high quality alignments.
如果已知相应的蛋白质衍生物,DNA-蛋白质比对算法可用于在基因组序列中发现编码序列。通过使用来自相关物种的蛋白质,它们还可用于识别新测序基因组的潜在编码序列。先前已知的算法要么解决简化的形式,要么为了实现实际应用而牺牲最优性。在本文中,我们提出了DNA-蛋白质比对问题的全面形式,并提出一种算法,该算法仅使用四个大小为(m + 1) x (n + 1)的表,就能在O(mn)时间内计算出最优比对,其中m和n分别是DNA序列和蛋白质序列的长度。我们还开发了一个实现该解决方案的蛋白质与DNA比对程序PanDA。实验结果表明,我们的算法能产生高质量的比对结果。