Suppr超能文献

DIALIGN-TX:基于片段的多序列比对的贪心与渐进方法。

DIALIGN-TX: greedy and progressive approaches for segment-based multiple sequence alignment.

作者信息

Subramanian Amarendran R, Kaufmann Michael, Morgenstern Burkhard

机构信息

University of Tübingen, Wilhelm-Schickard-Institut für Informatik, Sand 13, 72076 Tübingen, Germany.

出版信息

Algorithms Mol Biol. 2008 May 27;3:6. doi: 10.1186/1748-7188-3-6.

Abstract

BACKGROUND

DIALIGN-T is a reimplementation of the multiple-alignment program DIALIGN. Due to several algorithmic improvements, it produces significantly better alignments on locally and globally related sequence sets than previous versions of DIALIGN. However, like the original implementation of the program, DIALIGN-T uses a a straight-forward greedy approach to assemble multiple alignments from local pairwise sequence similarities. Such greedy approaches may be vulnerable to spurious random similarities and can therefore lead to suboptimal results. In this paper, we present DIALIGN-TX, a substantial improvement of DIALIGN-T that combines our previous greedy algorithm with a progressive alignment approach.

RESULTS

Our new heuristic produces significantly better alignments, especially on globally related sequences, without increasing the CPU time and memory consumption exceedingly. The new method is based on a guide tree; to detect possible spurious sequence similarities, it employs a vertex-cover approximation on a conflict graph. We performed benchmarking tests on a large set of nucleic acid and protein sequences For protein benchmarks we used the benchmark database BALIBASE 3 and an updated release of the database IRMBASE 2 for assessing the quality on globally and locally related sequences, respectively. For alignment of nucleic acid sequences, we used BRAliBase II for global alignment and a newly developed database of locally related sequences called DIRM-BASE 1. IRMBASE 2 and DIRMBASE 1 are constructed by implanting highly conserved motives at random positions in long unalignable sequences.

CONCLUSION

On BALIBASE3, our new program performs significantly better than the previous program DIALIGN-T and outperforms the popular global aligner CLUSTAL W, though it is still outperformed by programs that focus on global alignment like MAFFT, MUSCLE and T-COFFEE. On the locally related test sets in IRMBASE 2 and DIRM-BASE 1, our method outperforms all other programs while MAFFT E-INSi is the only method that comes close to the performance of DIALIGN-TX.

摘要

背景

DIALIGN-T是多序列比对程序DIALIGN的重新实现。由于在算法上有多项改进,与DIALIGN的早期版本相比,它在局部和全局相关序列集上生成的比对结果有显著提升。然而,与该程序的原始实现一样,DIALIGN-T采用一种直接的贪心方法,从局部两两序列相似性中组装多序列比对。这种贪心方法可能容易受到虚假随机相似性的影响,因此可能导致次优结果。在本文中,我们介绍了DIALIGN-TX,它是DIALIGN-T的重大改进版本,将我们之前的贪心算法与渐进比对方法相结合。

结果

我们新的启发式算法生成的比对结果显著更好,尤其是在全局相关序列上,同时不会过度增加CPU时间和内存消耗。新方法基于一棵引导树;为了检测可能的虚假序列相似性,它在冲突图上采用顶点覆盖近似法。我们对一大组核酸和蛋白质序列进行了基准测试。对于蛋白质基准测试,我们使用基准数据库BALIBASE 3和更新后的数据库IRMBASE 2版本,分别评估全局和局部相关序列的质量。对于核酸序列比对,我们使用BRAliBase II进行全局比对,并使用一个新开发的名为DIRM-BASE 1的局部相关序列数据库。IRMBASE 2和DIRMBASE 1是通过在长的不可比对序列中的随机位置植入高度保守基序构建的。

结论

在BALIBASE3上,我们的新程序表现明显优于先前的程序DIALIGN-T,并且优于流行的全局比对工具CLUSTAL W,不过它仍然比专注于全局比对的程序如MAFFT、MUSCLE和T-COFFEE表现稍逊。在IRMBASE 2和DIRM-BASE 1中的局部相关测试集上,我们的方法优于所有其他程序,而MAFFT E-INSi是唯一接近DIALIGN-TX性能的方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0157/2430965/01f73592b4fb/1748-7188-3-6-1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验