Suppr超能文献

约束增量树构建:具有改进的可扩展性和准确性的新型绝对快速收敛系统发育估计方法。

Constrained incremental tree building: new absolute fast converging phylogeny estimation methods with improved scalability and accuracy.

作者信息

Zhang Qiuyi, Rao Satish, Warnow Tandy

机构信息

1Department of Mathematics, University of California Berkeley, Evans Hall, Berkeley, CA 94720 USA.

2Department of Computer Science, University of California Berkeley, SODA Hall, Berkeley, CA 94720 USA.

出版信息

Algorithms Mol Biol. 2019 Feb 6;14:2. doi: 10.1186/s13015-019-0136-9. eCollection 2019.

Abstract

BACKGROUND

Absolute fast converging (AFC) phylogeny estimation methods are ones that have been proven to recover the true tree with high probability given sequences whose lengths are polynomial in the number of number of leaves in the tree (once the shortest and longest branch weights are fixed). While there has been a large literature on AFC methods, the best in terms of empirical performance was published in SODA 2001. The main empirical advantage of over other AFC methods is its use of neighbor joining () to construct trees on smaller taxon subsets, which are then combined into a tree on the full set of species using a supertree method; in contrast, the other AFC methods in essence depend on quartet trees that are computed independently of each other, which reduces accuracy compared to neighbor joining. However, is unlikely to scale to large datasets due to its reliance on supertree methods, as no current supertree methods are able to scale to large datasets with high accuracy.

RESULTS

In this study we present a new approach to large-scale phylogeny estimation that shares some of the features of but bypasses the use of supertree methods. We prove that this new approach is AFC and uses polynomial time and space. Furthermore, we describe variations on this basic approach that can be used with leaf-disjoint constraint trees (computed using methods such as maximum likelihood) to produce other methods that are likely to provide even better accuracy. Thus, we present a new generalizable technique for large-scale tree estimation that is designed to improve scalability for phylogeny estimation methods to ultra-large datasets, and that can be used in a variety of settings (including tree estimation from unaligned sequences, and species tree estimation from gene trees).

摘要

背景

绝对快速收敛(AFC)系统发育估计方法是指,对于树中叶的数量为多项式长度的序列(一旦最短和最长分支权重固定),已被证明能以高概率恢复真实树的方法。虽然关于AFC方法已有大量文献,但就实证性能而言,最佳方法发表于2001年的美国计算机协会离散算法研讨会(SODA)。与其他AFC方法相比,其主要实证优势在于使用邻接法(NJ)在较小分类单元子集上构建树,然后使用超树方法将这些树合并为完整物种集的树;相比之下,其他AFC方法本质上依赖于相互独立计算的四重奏树,与邻接法相比,这降低了准确性。然而,由于依赖超树方法,它不太可能扩展到大型数据集,因为目前没有超树方法能够高精度地扩展到大型数据集。

结果

在本研究中,我们提出了一种大规模系统发育估计的新方法,该方法具有一些[前文提及方法]的特征,但绕过了超树方法的使用。我们证明了这种新方法是AFC,并且使用多项式时间和空间。此外,我们描述了这种基本方法的变体,这些变体可与叶不相交约束树(使用最大似然等方法计算)一起使用,以产生可能提供更高准确性的其他方法。因此,我们提出了一种新的可推广技术用于大规模树估计,该技术旨在提高系统发育估计方法对超大型数据集的可扩展性,并且可用于各种场景(包括从未比对序列估计树,以及从基因树估计物种树)。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3a56/6364484/be881a042679/13015_2019_136_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验