Suppr超能文献

芒果:与N个缺口寡核苷酸的多序列比对。

Mango: multiple alignment with N gapped oligos.

作者信息

Zhang Zefeng, Lin Hao, Li Ming

机构信息

Computational Biology Research Group, Division of Intelligent Software Systems, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China.

出版信息

J Bioinform Comput Biol. 2008 Jun;6(3):521-41. doi: 10.1142/s0219720008003527.

Abstract

Multiple sequence alignment is a classical and challenging task. The problem is NP-hard. The full dynamic programming takes too much time. The progressive alignment heuristics adopted by most state-of-the-art works suffer from the "once a gap, always a gap" phenomenon. Is there a radically new way to do multiple sequence alignment? In this paper, we introduce a novel and orthogonal multiple sequence alignment method, using both multiple optimized spaced seeds and new algorithms to handle these seeds efficiently. Our new algorithm processes information of all sequences as a whole and tries to build the alignment vertically, avoiding problems caused by the popular progressive approaches. Because the optimized spaced seeds have proved significantly more sensitive than the consecutive k-mers, the new approach promises to be more accurate and reliable. To validate our new approach, we have implemented MANGO: Multiple Alignment with N Gapped Oligos. Experiments were carried out on large 16S RNA benchmarks, showing that MANGO compares favorably, in both accuracy and speed, against state-of-the-art multiple sequence alignment methods, including ClustalW 1.83, MUSCLE 3.6, MAFFT 5.861, ProbConsRNA 1.11, Dialign 2.2.1, DIALIGN-T 0.2.1, T-Coffee 4.85, POA 2.0, and Kalign 2.0. We have further demonstrated the scalability of MANGO on very large datasets of repeat elements. MANGO can be downloaded at http://www.bioinfo.org.cn/mango/ and is free for academic usage.

摘要

多序列比对是一项经典且具有挑战性的任务。该问题是NP难问题。完整的动态规划方法耗时过长。大多数前沿工作所采用的渐进比对启发式算法存在“一旦有缺口,永远有缺口”的现象。是否存在一种全新的多序列比对方法呢?在本文中,我们介绍了一种新颖且正交的多序列比对方法,它使用多个优化的间隔种子以及新算法来高效处理这些种子。我们的新算法将所有序列的信息作为一个整体进行处理,并尝试垂直构建比对,避免了流行的渐进方法所带来的问题。由于已证明优化的间隔种子比连续的k-mer更具敏感性,所以新方法有望更加准确和可靠。为了验证我们的新方法,我们实现了MANGO:带N个缺口寡核苷酸的多序列比对。在大型16S RNA基准数据集上进行了实验,结果表明MANGO在准确性和速度方面均优于包括ClustalW 1.83、MUSCLE 3.6、MAFFT 5.861、ProbConsRNA 1.11、Dialign 2.2.1、DIALIGN-T 0.2.1、T-Coffee 4.85、POA 2.0和Kalign 2.0在内的前沿多序列比对方法。我们进一步证明了MANGO在非常大的重复元件数据集上的可扩展性。MANGO可从http://www.bioinfo.org.cn/mango/下载,供学术使用免费。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验