DACTAL：无需对齐的分而治之树（几乎）。

DACTAL: divide-and-conquer trees (almost) without alignments.

机构信息

Department of Computer Science, Calvin College, Grand Rapids, MI 49546, USA.

出版信息

Bioinformatics. 2012 Jun 15;28(12):i274-82. doi: 10.1093/bioinformatics/bts218.

DOI:10.1093/bioinformatics/bts218

PMID:22689772

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3371850/

Abstract

MOTIVATION

While phylogenetic analyses of datasets containing 1000-5000 sequences are challenging for existing methods, the estimation of substantially larger phylogenies poses a problem of much greater complexity and scale.

METHODS

We present DACTAL, a method for phylogeny estimation that produces trees from unaligned sequence datasets without ever needing to estimate an alignment on the entire dataset. DACTAL combines iteration with a novel divide-and-conquer approach, so that each iteration begins with a tree produced in the prior iteration, decomposes the taxon set into overlapping subsets, estimates trees on each subset, and then combines the smaller trees into a tree on the full taxon set using a new supertree method. We prove that DACTAL is guaranteed to produce the true tree under certain conditions. We compare DACTAL to SATé and maximum likelihood trees on estimated alignments using simulated and real datasets with 1000-27 643 taxa.

RESULTS

Our studies show that on average DACTAL yields more accurate trees than the two-phase methods we studied on very large datasets that are difficult to align, and has approximately the same accuracy on the easier datasets. The comparison to SATé shows that both have the same accuracy, but that DACTAL achieves this accuracy in a fraction of the time. Furthermore, DACTAL can analyze larger datasets than SATé, including a dataset with almost 28 000 sequences.

AVAILABILITY

DACTAL source code and results of dataset analyses are available at www.cs.utexas.edu/users/phylo/software/dactal.

摘要

动机

虽然对于现有方法来说，分析包含 1000-5000 个序列的数据集的系统发育是具有挑战性的，但估计数量更大的系统发育则是一个更为复杂和大规模的问题。

方法

我们提出了 DACTAL 方法，这是一种用于系统发育估计的方法，它可以从不对齐的序列数据集生成树，而无需在整个数据集上估计对齐。DACTAL 结合了迭代和一种新的分治方法，因此每个迭代都从前一次迭代生成的树开始，将分类群集分解为重叠子集，在每个子集中估计树，然后使用新的超树方法将较小的树合并到完整分类群集中的树上。我们证明了在某些条件下，DACTAL 保证生成真实的树。我们将 DACTAL 与 SATé 和最大似然树在使用模拟和真实数据集的估计对齐上进行比较，这些数据集的分类群数为 1000-27643。

结果

我们的研究表明，在非常大的难以对齐的数据集上，DACTAL 平均比我们研究的两阶段方法产生更准确的树，并且在较容易的数据集上具有大致相同的准确性。与 SATé 的比较表明，两者具有相同的准确性，但 DACTAL 可以在更短的时间内实现这一准确性。此外，DACTAL 可以分析比 SATé 更大的数据集，包括一个几乎包含 28000 个序列的数据集。

可用性

DACTAL 的源代码和数据集分析结果可在 www.cs.utexas.edu/users/phylo/software/dactal 上获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4017/3371850/0e3ecc85e716/bts218f1.jpg

相似文献

DACTAL: divide-and-conquer trees (almost) without alignments.DACTAL：无需对齐的分而治之树（几乎）。

Bioinformatics. 2012 Jun 15;28(12):i274-82. doi: 10.1093/bioinformatics/bts218.

SATe-II: very fast and accurate simultaneous estimation of multiple sequence alignments and phylogenetic trees.SATe-II：一种非常快速且准确的同时估计多个序列比对和系统发育树的方法。

Syst Biol. 2012 Jan;61(1):90-106. doi: 10.1093/sysbio/syr095. Epub 2011 Dec 1.

SEPP: SATé-enabled phylogenetic placement.SEPP：基于SATé的系统发育定位

Pac Symp Biocomput. 2012:247-58. doi: 10.1142/9789814366496_0024.

Rapid and accurate large-scale coestimation of sequence alignments and phylogenetic trees.序列比对和系统发育树的快速准确大规模联合估计

Science. 2009 Jun 19;324(5934):1561-4. doi: 10.1126/science.1171243.

MAGUS: Multiple sequence Alignment using Graph clUStering.MAGUS：基于图聚类的多重序列比对。

Bioinformatics. 2021 Jul 19;37(12):1666-1672. doi: 10.1093/bioinformatics/btaa992.

MRL and SuperFine+MRL: new supertree methods.MRL和SuperFine+MRL：新的超树方法。

Algorithms Mol Biol. 2012 Jan 26;7(1):3. doi: 10.1186/1748-7188-7-3.

The effect of the guide tree on multiple sequence alignments and subsequent phylogenetic analyses.引导树对多序列比对及后续系统发育分析的影响。

Pac Symp Biocomput. 2008:25-36. doi: 10.1142/9789812776136_0004.

RAxML and FastTree: comparing two methods for large-scale maximum likelihood phylogeny estimation.RAxML 和 FastTree：比较两种大规模最大似然系统发育估计方法。

PLoS One. 2011;6(11):e27731. doi: 10.1371/journal.pone.0027731. Epub 2011 Nov 21.

Using ESTs for phylogenomics: can one accurately infer a phylogenetic tree from a gappy alignment?利用ESTs进行系统发育基因组学研究：能否从有缺口的比对中准确推断系统发育树？

BMC Evol Biol. 2008 Mar 26;8:95. doi: 10.1186/1471-2148-8-95.

FastSP: linear time calculation of alignment accuracy.FastSP：线性时间计算比对准确性。

Bioinformatics. 2011 Dec 1;27(23):3250-8. doi: 10.1093/bioinformatics/btr553. Epub 2011 Oct 7.

引用本文的文献

Spectral cluster supertree: fast and statistically robust merging of rooted phylogenetic trees.光谱聚类超树：有根系统发育树的快速且统计稳健的合并

Front Mol Biosci. 2024 Oct 30;11:1432495. doi: 10.3389/fmolb.2024.1432495. eCollection 2024.

Efficient phylogenetic tree inference for massive taxonomic datasets: harnessing the power of a server to analyze 1 million taxa.针对海量分类数据集的高效系统发育树推断：利用服务器的能力分析100万个分类单元。

Gigascience. 2024 Jan 2;13. doi: 10.1093/gigascience/giae055.

Scaling DEPP phylogenetic placement to ultra-large reference trees: a tree-aware ensemble approach.将 DEPP 系统发育定位扩展到超大规模参考树：一种基于树的集成方法。

Bioinformatics. 2024 Jun 3;40(6). doi: 10.1093/bioinformatics/btae361.

Generation of accurate, expandable phylogenomic trees with uDance.使用 uDance 生成准确、可扩展的系统发育基因组树。

Nat Biotechnol. 2024 May;42(5):768-777. doi: 10.1038/s41587-023-01868-8. Epub 2023 Jul 27.

Weighting by Gene Tree Uncertainty Improves Accuracy of Quartet-based Species Trees.基于基因树不确定性的加权可提高基于四元组的种系发生树的准确性。

Mol Biol Evol. 2022 Dec 5;39(12). doi: 10.1093/molbev/msac215.

Recent progress on methods for estimating and updating large phylogenies.关于估计和更新大型系统发育树的方法的最新进展。

Philos Trans R Soc Lond B Biol Sci. 2022 Oct 10;377(1861):20210244. doi: 10.1098/rstb.2021.0244. Epub 2022 Aug 22.

Using Robinson-Foulds supertrees in divide-and-conquer phylogeny estimation.在分治系统发育估计中使用罗宾逊-福尔兹超树

Algorithms Mol Biol. 2021 Jun 28;16(1):12. doi: 10.1186/s13015-021-00189-2.

Multiple Sequence Alignment for Large Heterogeneous Datasets Using SATé, PASTA, and UPP.使用SATé、PASTA和UPP对大型异构数据集进行多序列比对。

Methods Mol Biol. 2021;2231:99-119. doi: 10.1007/978-1-0716-1036-7_7.

Unblended disjoint tree merging using GTM improves species tree estimation.使用 GTM 的非混合不相交树合并可提高物种树估计的准确性。

BMC Genomics. 2020 Apr 16;21(Suppl 2):235. doi: 10.1186/s12864-020-6605-1.

TreeMerge: a new method for improving the scalability of species tree estimation methods.TreeMerge：一种提高物种树估计方法可扩展性的新方法。

Bioinformatics. 2019 Jul 15;35(14):i417-i426. doi: 10.1093/bioinformatics/btz344.

本文引用的文献

Standard maximum likelihood analyses of alignments with gaps can be statistically inconsistent.对存在空位的比对进行标准的最大似然分析在统计上可能是不一致的。

PLoS Curr. 2012 Mar 9;4:RRN1308. doi: 10.1371/currents.RRN1308.

SuperFine: fast and accurate supertree estimation.SuperFine：快速准确的超级树估计。

Syst Biol. 2012 Mar;61(2):214-27. doi: 10.1093/sysbio/syr092. Epub 2011 Sep 20.

The impact of multiple protein sequence alignment on phylogenetic estimation.多序列比对对系统发育估计的影响。

IEEE/ACM Trans Comput Biol Bioinform. 2011 Jul-Aug;8(4):1108-19. doi: 10.1109/TCBB.2009.68.

Multiple sequence alignment: a major challenge to large-scale phylogenetics.多序列比对：大规模系统发育学面临的一项重大挑战。

PLoS Curr. 2010 Nov 19;2:RRN1198. doi: 10.1371/currents.RRN1198.

FastTree 2--approximately maximum-likelihood trees for large alignments.FastTree 2--用于大型比对的近似最大似然树。

PLoS One. 2010 Mar 10;5(3):e9490. doi: 10.1371/journal.pone.0009490.

Toward extracting all phylogenetic information from matrices of evolutionary distances.从进化距离矩阵中提取所有系统发育信息。

Science. 2010 Mar 12;327(5971):1376-9. doi: 10.1126/science.1182300.

Rapid and accurate large-scale coestimation of sequence alignments and phylogenetic trees.序列比对和系统发育树的快速准确大规模联合估计

Science. 2009 Jun 19;324(5934):1561-4. doi: 10.1126/science.1171243.

Mega-phylogeny approach for comparative biology: an alternative to supertree and supermatrix approaches.用于比较生物学的巨系统发育方法：超级树和超级矩阵方法的替代方案。

BMC Evol Biol. 2009 Feb 11;9:37. doi: 10.1186/1471-2148-9-37.

StatAlign: an extendable software package for joint Bayesian estimation of alignments and evolutionary trees.StatAlign：一个用于比对和进化树联合贝叶斯估计的可扩展软件包。

Bioinformatics. 2008 Oct 15;24(20):2403-4. doi: 10.1093/bioinformatics/btn457. Epub 2008 Aug 27.

Recent developments in the MAFFT multiple sequence alignment program.MAFFT多序列比对程序的最新进展。

Brief Bioinform. 2008 Jul;9(4):286-98. doi: 10.1093/bib/bbn013. Epub 2008 Mar 27.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

DACTAL：无需对齐的分而治之树（几乎）。

DACTAL: divide-and-conquer trees (almost) without alignments.

机构信息

出版信息

MOTIVATION

METHODS

RESULTS

AVAILABILITY

动机

方法

结果

可用性

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献