Suppr超能文献

空间保守最优DNA-蛋白质比对。

Space-conserving optimal DNA-protein alignment.

作者信息

Ko Pang, Narayanan Mahesh, Kalyanaraman Anantharaman, Aluru Srinivas

机构信息

Department of Electrical and Computer Engineering, Iowa State University, USA.

出版信息

Proc IEEE Comput Syst Bioinform Conf. 2004:80-8. doi: 10.1109/csb.2004.1332420.

Abstract

DNA-protein alignment algorithms can be used to discover coding sequences in a genomic sequence, if the corresponding protein derivatives are known. They can also be used to identify potential coding sequences of a newly sequenced genome, by using proteins from related species. Previously known algorithms either solve a simplified formulation, or sacrifice optimality to achieve practical implementation. In this paper, we present a comprehensive formulation of the DNA-protein alignment problem, and an algorithm to compute the optimal alignment in O(mn) time using only four tables of size (m + 1) x (n + 1), where m and n are the lengths of the DNA and protein sequences, respectively. We also developed a Protein and DNA Alignment program PanDA that implements the proposed solution. Experimental results indicate that our algorithm produces high quality alignments.

摘要

如果已知相应的蛋白质衍生物,DNA-蛋白质比对算法可用于在基因组序列中发现编码序列。通过使用来自相关物种的蛋白质,它们还可用于识别新测序基因组的潜在编码序列。先前已知的算法要么解决简化的形式,要么为了实现实际应用而牺牲最优性。在本文中,我们提出了DNA-蛋白质比对问题的全面形式,并提出一种算法,该算法仅使用四个大小为(m + 1) x (n + 1)的表,就能在O(mn)时间内计算出最优比对,其中m和n分别是DNA序列和蛋白质序列的长度。我们还开发了一个实现该解决方案的蛋白质与DNA比对程序PanDA。实验结果表明,我们的算法能产生高质量的比对结果。

相似文献

1
Space-conserving optimal DNA-protein alignment.
Proc IEEE Comput Syst Bioinform Conf. 2004:80-8. doi: 10.1109/csb.2004.1332420.
2
Compression of Multiple DNA Sequences Using Intra-Sequence and Inter-Sequence Similarities.
IEEE/ACM Trans Comput Biol Bioinform. 2015 Nov-Dec;12(6):1322-32. doi: 10.1109/TCBB.2015.2403370.
3
ABS: Sequence alignment by scanning.
Annu Int Conf IEEE Eng Med Biol Soc. 2011;2011:928-31. doi: 10.1109/IEMBS.2011.6090209.
4
A three-state model for DNA protein-coding regions.
IEEE Trans Biomed Eng. 2006 Nov;53(11):2148-55. doi: 10.1109/TBME.2006.879477.
5
Faster algorithms for optimal multiple sequence alignment based on pairwise comparisons.
IEEE/ACM Trans Comput Biol Bioinform. 2006 Oct-Dec;3(4):408-22. doi: 10.1109/TCBB.2006.53.
6
DNA Multiple Sequence Alignment Guided by Protein Domains: The MSA-PAD 2.0 Method.
Methods Mol Biol. 2018;1746:173-180. doi: 10.1007/978-1-4939-7683-6_13.
7
Compressed pattern matching in DNA sequences.
Proc IEEE Comput Syst Bioinform Conf. 2004:62-8. doi: 10.1109/csb.2004.1332418.
9
Simulated annealing algorithm for the multiple sequence alignment problem: the approach of polymers in a random medium.
Phys Rev E Stat Nonlin Soft Matter Phys. 2005 Sep;72(3 Pt 1):031915. doi: 10.1103/PhysRevE.72.031915. Epub 2005 Sep 27.
10
Highly improved homopolymer aware nucleotide-protein alignments with 454 data.
BMC Bioinformatics. 2012 Sep 12;13:230. doi: 10.1186/1471-2105-13-230.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验