• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用NJMerge进行系统发育估计的统计上一致的分治管道。

Statistically consistent divide-and-conquer pipelines for phylogeny estimation using NJMerge.

作者信息

Molloy Erin K, Warnow Tandy

机构信息

Department of Computer Science, University of Illinois at Urbana-Champaign, 201 North Goodwin Avenue, Urbana, IL 61801 USA.

出版信息

Algorithms Mol Biol. 2019 Jul 19;14:14. doi: 10.1186/s13015-019-0151-x. eCollection 2019.

DOI:10.1186/s13015-019-0151-x
PMID:31360216
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6642500/
Abstract

BACKGROUND

Divide-and-conquer methods, which divide the species set into overlapping subsets, construct a tree on each subset, and then combine the subset trees using a supertree method, provide a key algorithmic framework for boosting the scalability of phylogeny estimation methods to large datasets. Yet the use of supertree methods, which typically attempt to solve NP-hard optimization problems, limits the scalability of such approaches.

RESULTS

In this paper, we introduce a divide-and-conquer approach that does not require supertree estimation: we divide the species set into pairwise disjoint subsets, construct a tree on each subset using a base method, and then combine the subset trees using a distance matrix. For this merger step, we present a new method, called NJMerge, which is a polynomial-time extension of Neighbor Joining (NJ); thus, NJMerge can be viewed either as a method for improving traditional NJ or as a method for scaling the base method to larger datasets. We prove that NJMerge can be used to create divide-and-conquer pipelines that are statistically consistent under some models of evolution. We also report the results of an extensive simulation study evaluating NJMerge on multi-locus datasets with up to 1000 species. We found that NJMerge sometimes improved the accuracy of traditional NJ and substantially reduced the running time of three popular species tree methods (ASTRAL-III, SVDquartets, and "concatenation" using RAxML) without sacrificing accuracy. Finally, although NJMerge can fail to return a tree, in our experiments, NJMerge failed on only 11 out of 2560 test cases.

CONCLUSIONS

Theoretical and empirical results suggest that NJMerge is a valuable technique for large-scale phylogeny estimation, especially when computational resources are limited. NJMerge is freely available on Github (http://github.com/ekmolloy/njmerge).

摘要

背景

分治方法将物种集划分为重叠子集,在每个子集上构建一棵树,然后使用超树方法组合子集树,为提高系统发育估计方法对大型数据集的可扩展性提供了关键的算法框架。然而,超树方法的使用通常试图解决NP难优化问题,限制了此类方法的可扩展性。

结果

在本文中,我们介绍了一种不需要超树估计的分治方法:我们将物种集划分为两两不相交的子集,使用基本方法在每个子集上构建一棵树,然后使用距离矩阵组合子集树。对于这个合并步骤,我们提出了一种新方法,称为NJMerge,它是邻接法(NJ)的多项式时间扩展;因此,NJMerge既可以看作是一种改进传统NJ的方法,也可以看作是一种将基本方法扩展到更大数据集的方法。我们证明NJMerge可用于创建在某些进化模型下具有统计一致性的分治管道。我们还报告了一项广泛模拟研究的结果,该研究在多达1000个物种的多基因座数据集上评估了NJMerge。我们发现NJMerge有时提高了传统NJ的准确性,并大幅缩短了三种流行物种树方法(ASTRAL-III、SVDquartets和使用RAxML的“串联”)的运行时间,且不牺牲准确性。最后,虽然NJMerge可能无法返回一棵树,但在我们的实验中,NJMerge在2560个测试用例中仅失败了11次。

结论

理论和实证结果表明,NJMerge是大规模系统发育估计的一种有价值的技术,特别是在计算资源有限的情况下。NJMerge可在Github(http://github.com/ekmolloy/njmerge)上免费获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4cbc/6642500/9f9b0d907767/13015_2019_151_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4cbc/6642500/f0e47b1c379e/13015_2019_151_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4cbc/6642500/cac339c5d288/13015_2019_151_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4cbc/6642500/6b8ba88c2b46/13015_2019_151_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4cbc/6642500/0135a66f82c0/13015_2019_151_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4cbc/6642500/ba5dee804376/13015_2019_151_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4cbc/6642500/bbb3b7f67ec4/13015_2019_151_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4cbc/6642500/7b3f340f9c5d/13015_2019_151_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4cbc/6642500/91bcdb83bf73/13015_2019_151_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4cbc/6642500/4bcd56f787a4/13015_2019_151_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4cbc/6642500/9f9b0d907767/13015_2019_151_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4cbc/6642500/f0e47b1c379e/13015_2019_151_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4cbc/6642500/cac339c5d288/13015_2019_151_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4cbc/6642500/6b8ba88c2b46/13015_2019_151_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4cbc/6642500/0135a66f82c0/13015_2019_151_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4cbc/6642500/ba5dee804376/13015_2019_151_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4cbc/6642500/bbb3b7f67ec4/13015_2019_151_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4cbc/6642500/7b3f340f9c5d/13015_2019_151_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4cbc/6642500/91bcdb83bf73/13015_2019_151_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4cbc/6642500/4bcd56f787a4/13015_2019_151_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4cbc/6642500/9f9b0d907767/13015_2019_151_Fig10_HTML.jpg

相似文献

1
Statistically consistent divide-and-conquer pipelines for phylogeny estimation using NJMerge.使用NJMerge进行系统发育估计的统计上一致的分治管道。
Algorithms Mol Biol. 2019 Jul 19;14:14. doi: 10.1186/s13015-019-0151-x. eCollection 2019.
2
TreeMerge: a new method for improving the scalability of species tree estimation methods.TreeMerge:一种提高物种树估计方法可扩展性的新方法。
Bioinformatics. 2019 Jul 15;35(14):i417-i426. doi: 10.1093/bioinformatics/btz344.
3
Using Robinson-Foulds supertrees in divide-and-conquer phylogeny estimation.在分治系统发育估计中使用罗宾逊-福尔兹超树
Algorithms Mol Biol. 2021 Jun 28;16(1):12. doi: 10.1186/s13015-021-00189-2.
4
Using Constrained-INC for Large-Scale Gene Tree and Species Tree Estimation.使用约束增量法进行大规模基因树和物种树估计。
IEEE/ACM Trans Comput Biol Bioinform. 2021 Jan-Feb;18(1):2-15. doi: 10.1109/TCBB.2020.2990867. Epub 2021 Feb 3.
5
Constrained incremental tree building: new absolute fast converging phylogeny estimation methods with improved scalability and accuracy.约束增量树构建:具有改进的可扩展性和准确性的新型绝对快速收敛系统发育估计方法。
Algorithms Mol Biol. 2019 Feb 6;14:2. doi: 10.1186/s13015-019-0136-9. eCollection 2019.
6
Unblended disjoint tree merging using GTM improves species tree estimation.使用 GTM 的非混合不相交树合并可提高物种树估计的准确性。
BMC Genomics. 2020 Apr 16;21(Suppl 2):235. doi: 10.1186/s12864-020-6605-1.
7
A comparative study of SVDquartets and other coalescent-based species tree estimation methods.SVDquartets与其他基于溯祖理论的物种树估计方法的比较研究。
BMC Genomics. 2015;16 Suppl 10(Suppl 10):S2. doi: 10.1186/1471-2164-16-S10-S2. Epub 2015 Oct 2.
8
MRL and SuperFine+MRL: new supertree methods.MRL和SuperFine+MRL:新的超树方法。
Algorithms Mol Biol. 2012 Jan 26;7(1):3. doi: 10.1186/1748-7188-7-3.
9
DACTAL: divide-and-conquer trees (almost) without alignments.DACTAL:无需对齐的分而治之树(几乎)。
Bioinformatics. 2012 Jun 15;28(12):i274-82. doi: 10.1093/bioinformatics/bts218.
10
SVDquest: Improving SVDquartets species tree estimation using exact optimization within a constrained search space.SVDquest:在约束搜索空间内使用精确优化提高 SVDquartets 种系树估计。
Mol Phylogenet Evol. 2018 Jul;124:122-136. doi: 10.1016/j.ympev.2018.03.006. Epub 2018 Mar 9.

引用本文的文献

1
Phylogenetic networks empower biodiversity research.系统发育网络助力生物多样性研究。
Proc Natl Acad Sci U S A. 2025 Aug 5;122(31):e2410934122. doi: 10.1073/pnas.2410934122. Epub 2025 Jul 28.
2
Efficient and robust search of microbial genomes via phylogenetic compression.通过系统发育压缩对微生物基因组进行高效且稳健的搜索。
Nat Methods. 2025 Apr;22(4):692-697. doi: 10.1038/s41592-025-02625-2. Epub 2025 Apr 9.
3
Sparse Neighbor Joining: rapid phylogenetic inference using a sparse distance matrix.稀疏邻接法:使用稀疏距离矩阵进行快速系统发育推断。

本文引用的文献

1
SPECIES TREE INFERENCE FROM GENOMIC SEQUENCES USING THE LOG-DET DISTANCE.利用对数行列式距离从基因组序列推断物种树
SIAM J Appl Algebr Geom. 2019;3(1):107-127. doi: 10.1137/18m1194134. Epub 2019 Mar 14.
2
Long-Branch Attraction in Species Tree Estimation: Inconsistency of Partitioned Likelihood and Topology-Based Summary Methods.种系树估计中的长枝吸引:分区似然和基于拓扑的总结方法的不一致性。
Syst Biol. 2019 Mar 1;68(2):281-297. doi: 10.1093/sysbio/syy061.
3
ASTRAL-III: polynomial time species tree reconstruction from partially resolved gene trees.
Bioinformatics. 2024 Nov 28;40(12). doi: 10.1093/bioinformatics/btae701.
4
Spectral top-down recovery of latent tree models.潜在树模型的光谱自上而下恢复
Inf inference. 2023 Aug 16;12(3):iaad032. doi: 10.1093/imaiai/iaad032. eCollection 2023 Sep.
5
Weighted ASTRID: fast and accurate species trees from weighted internode distances.加权ASTRID:基于加权节间距离的快速准确物种树构建方法
Algorithms Mol Biol. 2023 Jul 19;18(1):6. doi: 10.1186/s13015-023-00230-6.
6
Quartet Fiduccia-Mattheyses revisited for larger phylogenetic studies.重新探讨 Fiduccia-Mattheyses 四重奏在更大的系统发育研究中的应用。
Bioinformatics. 2023 Jun 1;39(6). doi: 10.1093/bioinformatics/btad332.
7
Recent progress on methods for estimating and updating large phylogenies.关于估计和更新大型系统发育树的方法的最新进展。
Philos Trans R Soc Lond B Biol Sci. 2022 Oct 10;377(1861):20210244. doi: 10.1098/rstb.2021.0244. Epub 2022 Aug 22.
8
Development of Mini-Barcode Based on Chloroplast Genome and Its Application in Metabarcoding Molecular Identification of Chinese Medicinal Material Radix (Chishao).基于叶绿体基因组的微型条形码开发及其在中药赤芍代谢条形码分子鉴定中的应用
Front Plant Sci. 2022 Mar 31;13:819822. doi: 10.3389/fpls.2022.819822. eCollection 2022.
9
Novel metric for hyperbolic phylogenetic tree embeddings.双曲系统发生树嵌入的新度量。
Biol Methods Protoc. 2021 Mar 27;6(1):bpab006. doi: 10.1093/biomethods/bpab006. eCollection 2021.
10
Unblended disjoint tree merging using GTM improves species tree estimation.使用 GTM 的非混合不相交树合并可提高物种树估计的准确性。
BMC Genomics. 2020 Apr 16;21(Suppl 2):235. doi: 10.1186/s12864-020-6605-1.
ASTRAL-III:从部分解析的基因树重建多项式时间种系发生树。
BMC Bioinformatics. 2018 May 8;19(Suppl 6):153. doi: 10.1186/s12859-018-2129-y.
4
To Include or Not to Include: The Impact of Gene Filtering on Species Tree Estimation Methods.包含还是不包含:基因过滤对物种树估计方法的影响。
Syst Biol. 2018 Mar 1;67(2):285-303. doi: 10.1093/sysbio/syx077.
5
StarBEAST2 Brings Faster Species Tree Inference and Accurate Estimates of Substitution Rates.StarBEAST2实现了更快的物种树推断和替换率的准确估计。
Mol Biol Evol. 2017 Aug 1;34(8):2101-2114. doi: 10.1093/molbev/msx126.
6
Species Tree Inference from Gene Splits by Unrooted STAR Methods.无树根 STAR 方法从基因分裂推断种系树。
IEEE/ACM Trans Comput Biol Bioinform. 2018 Jan-Feb;15(1):337-342. doi: 10.1109/TCBB.2016.2604812. Epub 2016 Aug 31.
7
Challenges in Species Tree Estimation Under the Multispecies Coalescent Model.多物种溯祖模型下物种树估计的挑战
Genetics. 2016 Dec;204(4):1353-1368. doi: 10.1534/genetics.116.190173.
8
SimPhy: Phylogenomic Simulation of Gene, Locus, and Species Trees.SimPhy:基因树、基因座树和物种树的系统发育基因组学模拟
Syst Biol. 2016 Mar;65(2):334-44. doi: 10.1093/sysbio/syv082. Epub 2015 Nov 1.
9
ASTRID: Accurate Species TRees from Internode Distances.ASTRID:基于节间距离的精确物种树
BMC Genomics. 2015;16 Suppl 10(Suppl 10):S3. doi: 10.1186/1471-2164-16-S10-S3. Epub 2015 Oct 2.
10
Data Requirement for Phylogenetic Inference from Multiple Loci: A New Distance Method.基于多个基因座进行系统发育推断的数据要求:一种新的距离方法。
IEEE/ACM Trans Comput Biol Bioinform. 2015 Mar-Apr;12(2):422-32. doi: 10.1109/TCBB.2014.2361685.