DIALIGN-TX：基于片段的多序列比对的贪心与渐进方法。

DIALIGN-TX: greedy and progressive approaches for segment-based multiple sequence alignment.

作者信息

Subramanian Amarendran R, Kaufmann Michael, Morgenstern Burkhard

机构信息

University of Tübingen, Wilhelm-Schickard-Institut für Informatik, Sand 13, 72076 Tübingen, Germany.

出版信息

Algorithms Mol Biol. 2008 May 27;3:6. doi: 10.1186/1748-7188-3-6.

DOI:10.1186/1748-7188-3-6

PMID:18505568

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2430965/

Abstract

BACKGROUND

DIALIGN-T is a reimplementation of the multiple-alignment program DIALIGN. Due to several algorithmic improvements, it produces significantly better alignments on locally and globally related sequence sets than previous versions of DIALIGN. However, like the original implementation of the program, DIALIGN-T uses a a straight-forward greedy approach to assemble multiple alignments from local pairwise sequence similarities. Such greedy approaches may be vulnerable to spurious random similarities and can therefore lead to suboptimal results. In this paper, we present DIALIGN-TX, a substantial improvement of DIALIGN-T that combines our previous greedy algorithm with a progressive alignment approach.

RESULTS

Our new heuristic produces significantly better alignments, especially on globally related sequences, without increasing the CPU time and memory consumption exceedingly. The new method is based on a guide tree; to detect possible spurious sequence similarities, it employs a vertex-cover approximation on a conflict graph. We performed benchmarking tests on a large set of nucleic acid and protein sequences For protein benchmarks we used the benchmark database BALIBASE 3 and an updated release of the database IRMBASE 2 for assessing the quality on globally and locally related sequences, respectively. For alignment of nucleic acid sequences, we used BRAliBase II for global alignment and a newly developed database of locally related sequences called DIRM-BASE 1. IRMBASE 2 and DIRMBASE 1 are constructed by implanting highly conserved motives at random positions in long unalignable sequences.

CONCLUSION

On BALIBASE3, our new program performs significantly better than the previous program DIALIGN-T and outperforms the popular global aligner CLUSTAL W, though it is still outperformed by programs that focus on global alignment like MAFFT, MUSCLE and T-COFFEE. On the locally related test sets in IRMBASE 2 and DIRM-BASE 1, our method outperforms all other programs while MAFFT E-INSi is the only method that comes close to the performance of DIALIGN-TX.

摘要

背景

DIALIGN-T是多序列比对程序DIALIGN的重新实现。由于在算法上有多项改进，与DIALIGN的早期版本相比，它在局部和全局相关序列集上生成的比对结果有显著提升。然而，与该程序的原始实现一样，DIALIGN-T采用一种直接的贪心方法，从局部两两序列相似性中组装多序列比对。这种贪心方法可能容易受到虚假随机相似性的影响，因此可能导致次优结果。在本文中，我们介绍了DIALIGN-TX，它是DIALIGN-T的重大改进版本，将我们之前的贪心算法与渐进比对方法相结合。

结果

我们新的启发式算法生成的比对结果显著更好，尤其是在全局相关序列上，同时不会过度增加CPU时间和内存消耗。新方法基于一棵引导树；为了检测可能的虚假序列相似性，它在冲突图上采用顶点覆盖近似法。我们对一大组核酸和蛋白质序列进行了基准测试。对于蛋白质基准测试，我们使用基准数据库BALIBASE 3和更新后的数据库IRMBASE 2版本，分别评估全局和局部相关序列的质量。对于核酸序列比对，我们使用BRAliBase II进行全局比对，并使用一个新开发的名为DIRM-BASE 1的局部相关序列数据库。IRMBASE 2和DIRMBASE 1是通过在长的不可比对序列中的随机位置植入高度保守基序构建的。

结论

在BALIBASE3上，我们的新程序表现明显优于先前的程序DIALIGN-T，并且优于流行的全局比对工具CLUSTAL W，不过它仍然比专注于全局比对的程序如MAFFT、MUSCLE和T-COFFEE表现稍逊。在IRMBASE 2和DIRM-BASE 1中的局部相关测试集上，我们的方法优于所有其他程序，而MAFFT E-INSi是唯一接近DIALIGN-TX性能的方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0157/2430965/01f73592b4fb/1748-7188-3-6-1.jpg

相似文献

DIALIGN-TX: greedy and progressive approaches for segment-based multiple sequence alignment.DIALIGN-TX：基于片段的多序列比对的贪心与渐进方法。

Algorithms Mol Biol. 2008 May 27;3:6. doi: 10.1186/1748-7188-3-6.

DIALIGN-T: an improved algorithm for segment-based multiple sequence alignment.DIALIGN-T：一种改进的基于片段的多序列比对算法。

BMC Bioinformatics. 2005 Mar 22;6:66. doi: 10.1186/1471-2105-6-66.

Assessing the efficiency of multiple sequence alignment programs.评估多序列比对程序的效率。

Algorithms Mol Biol. 2014 Mar 6;9(1):4. doi: 10.1186/1748-7188-9-4.

DIALIGN-TX and multiple protein alignment using secondary structure information at GOBICS.使用 GOBICS 的二级结构信息进行 DIALIGN-TX 和多序列比对。

Nucleic Acids Res. 2010 Jul;38(Web Server issue):W19-22. doi: 10.1093/nar/gkq442. Epub 2010 May 23.

DIALIGN: finding local similarities by multiple sequence alignment.DIALIGN：通过多序列比对寻找局部相似性。

Bioinformatics. 1998;14(3):290-4. doi: 10.1093/bioinformatics/14.3.290.

Mind the gaps: evidence of bias in estimates of multiple sequence alignments.注意差距：多重序列比对估计中的偏差证据。

Mol Biol Evol. 2007 Nov;24(11):2433-42. doi: 10.1093/molbev/msm176. Epub 2007 Aug 20.

Evaluating the accuracy and efficiency of multiple sequence alignment methods.评估多序列比对方法的准确性和效率。

Evol Bioinform Online. 2014 Dec 7;10:205-17. doi: 10.4137/EBO.S19199. eCollection 2014.

Automatic detection of anchor points for multiple sequence alignment.自动检测多序列比对的锚点。

BMC Bioinformatics. 2010 Sep 2;11:445. doi: 10.1186/1471-2105-11-445.

A min-cut algorithm for the consistency problem in multiple sequence alignment.一种用于多序列比对一致性问题的最小割算法。

Bioinformatics. 2010 Apr 15;26(8):1015-21. doi: 10.1093/bioinformatics/btq082. Epub 2010 Feb 25.

Multiple sequence alignment with DIALIGN.使用DIALIGN进行多序列比对。

Methods Mol Biol. 2014;1079:191-202. doi: 10.1007/978-1-62703-646-7_12.

引用本文的文献

A PCR primer design method for identifying spider mite species using k-mer counting.一种基于k-mer计数法鉴定叶螨种类的PCR引物设计方法。

PLoS One. 2025 Jun 9;20(6):e0321199. doi: 10.1371/journal.pone.0321199. eCollection 2025.

TPMA: A two pointers meta-alignment tool to ensemble different multiple nucleic acid sequence alignments.TPMA：一种双指针元比对工具，用于集成不同的多个核酸序列比对。

PLoS Comput Biol. 2024 Apr 1;20(4):e1011988. doi: 10.1371/journal.pcbi.1011988. eCollection 2024 Apr.

Balanced cell division is secured by two different regulatory sites in OxyS RNA.平衡的细胞分裂是由 OxyS RNA 中的两个不同调节位点来保证的。

RNA. 2024 Jan 16;30(2):124-135. doi: 10.1261/rna.079836.123.

Accuracy of multiple sequence alignment methods in the reconstruction of transposable element families.转座元件家族重建中多序列比对方法的准确性

NAR Genom Bioinform. 2022 May 17;4(2):lqac040. doi: 10.1093/nargab/lqac040. eCollection 2022 Jun.

Chromosome evolution and the genetic basis of agronomically important traits in greater yam.大薯的染色体进化和农艺重要性状的遗传基础。

Nat Commun. 2022 Apr 14;13(1):2001. doi: 10.1038/s41467-022-29114-w.

Transcriptomics provides a robust framework for the relationships of the major clades of cladobranch sea slugs (Mollusca, Gastropoda, Heterobranchia), but fails to resolve the position of the enigmatic genus Embletonia.转录组学为栉孔扇贝（软体动物门，腹足纲，异鳃目）的主要分支的关系提供了一个强大的框架，但未能解决神秘的 Embletonia 属的位置。

BMC Ecol Evol. 2021 Dec 28;21(1):226. doi: 10.1186/s12862-021-01944-0.

Genomic mechanisms of climate adaptation in polyploid bioenergy switchgrass.多倍体生物能源柳枝稷的气候适应的基因组机制。

Nature. 2021 Feb;590(7846):438-444. doi: 10.1038/s41586-020-03127-1. Epub 2021 Jan 27.

Genome biology of the paleotetraploid perennial biomass crop Miscanthus.古四倍体多年生生物质作物芒的基因组生物学。

Nat Commun. 2020 Oct 28;11(1):5442. doi: 10.1038/s41467-020-18923-6.

LMAP_S: Lightweight Multigene Alignment and Phylogeny eStimation.LMAP_S：轻量级多基因对齐与系统发育估算。

BMC Bioinformatics. 2019 Dec 30;20(1):739. doi: 10.1186/s12859-019-3292-5.

Kif2a Scales Meiotic Spindle Size in Hymenochirus boettgeri.Kif2a 调节布氏长颈龟的减数分裂纺锤体大小。

Curr Biol. 2019 Nov 4;29(21):3720-3727.e5. doi: 10.1016/j.cub.2019.08.073. Epub 2019 Oct 17.

本文引用的文献

Stability of multiple alignments and phylogenetic trees: an analysis of ABC-transporter proteins family.多序列比对和系统发育树的稳定性：ABC转运蛋白家族分析

Algorithms Mol Biol. 2008 Nov 6;3:15. doi: 10.1186/1748-7188-3-15.

An enhanced RNA alignment benchmark for sequence alignment programs.用于序列比对程序的增强型RNA比对基准。

Algorithms Mol Biol. 2006 Oct 24;1:19. doi: 10.1186/1748-7188-1-19.

Local decoding of sequences and alignment-free comparison.序列的局部解码与无比对比较。

J Comput Biol. 2006 Oct;13(8):1465-76. doi: 10.1089/cmb.2006.13.1465.

AUGUSTUS at EGASP: using EST, protein and genomic alignments for improved gene prediction in the human genome.EGASP中的AUGUSTUS：利用EST、蛋白质和基因组比对改进人类基因组中的基因预测

Genome Biol. 2006;7 Suppl 1(Suppl 1):S11.1-8. doi: 10.1186/gb-2006-7-s1-s11. Epub 2006 Aug 7.

Multiple sequence alignment with user-defined anchor points.使用用户定义的锚点进行多序列比对。

Algorithms Mol Biol. 2006 Apr 19;1(1):6. doi: 10.1186/1748-7188-1-6.

Gene prediction in eukaryotes with a generalized hidden Markov model that uses hints from external sources.使用来自外部源的提示，通过广义隐马尔可夫模型对真核生物进行基因预测。

BMC Bioinformatics. 2006 Feb 9;7:62. doi: 10.1186/1471-2105-7-62.

A benchmark of multiple sequence alignment programs upon structural RNAs.基于结构RNA的多序列比对程序的基准测试。

Nucleic Acids Res. 2005 Apr 28;33(8):2433-9. doi: 10.1093/nar/gki541. Print 2005.

DIALIGN-T: an improved algorithm for segment-based multiple sequence alignment.DIALIGN-T：一种改进的基于片段的多序列比对算法。

BMC Bioinformatics. 2005 Mar 22;6:66. doi: 10.1186/1471-2105-6-66.

ProbCons: Probabilistic consistency-based multiple sequence alignment.ProbCons：基于概率一致性的多序列比对。

Genome Res. 2005 Feb;15(2):330-40. doi: 10.1101/gr.2821705.

MAFFT version 5: improvement in accuracy of multiple sequence alignment.MAFFT 5 版本：多重序列比对准确性的提升。

Nucleic Acids Res. 2005 Jan 20;33(2):511-8. doi: 10.1093/nar/gki198. Print 2005.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

DIALIGN-TX：基于片段的多序列比对的贪心与渐进方法。

DIALIGN-TX: greedy and progressive approaches for segment-based multiple sequence alignment.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献