空间保守最优DNA-蛋白质比对。

Space-conserving optimal DNA-protein alignment.

作者信息

Ko Pang, Narayanan Mahesh, Kalyanaraman Anantharaman, Aluru Srinivas

机构信息

Department of Electrical and Computer Engineering, Iowa State University, USA.

出版信息

Proc IEEE Comput Syst Bioinform Conf. 2004:80-8. doi: 10.1109/csb.2004.1332420.

DOI:10.1109/csb.2004.1332420

PMID:16448002

Abstract

DNA-protein alignment algorithms can be used to discover coding sequences in a genomic sequence, if the corresponding protein derivatives are known. They can also be used to identify potential coding sequences of a newly sequenced genome, by using proteins from related species. Previously known algorithms either solve a simplified formulation, or sacrifice optimality to achieve practical implementation. In this paper, we present a comprehensive formulation of the DNA-protein alignment problem, and an algorithm to compute the optimal alignment in O(mn) time using only four tables of size (m + 1) x (n + 1), where m and n are the lengths of the DNA and protein sequences, respectively. We also developed a Protein and DNA Alignment program PanDA that implements the proposed solution. Experimental results indicate that our algorithm produces high quality alignments.

摘要

如果已知相应的蛋白质衍生物，DNA-蛋白质比对算法可用于在基因组序列中发现编码序列。通过使用来自相关物种的蛋白质，它们还可用于识别新测序基因组的潜在编码序列。先前已知的算法要么解决简化的形式，要么为了实现实际应用而牺牲最优性。在本文中，我们提出了DNA-蛋白质比对问题的全面形式，并提出一种算法，该算法仅使用四个大小为(m + 1) x (n + 1)的表，就能在O(mn)时间内计算出最优比对，其中m和n分别是DNA序列和蛋白质序列的长度。我们还开发了一个实现该解决方案的蛋白质与DNA比对程序PanDA。实验结果表明，我们的算法能产生高质量的比对结果。