关于估计和更新大型系统发育树的方法的最新进展。

Recent progress on methods for estimating and updating large phylogenies.

机构信息

Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA.

出版信息

Philos Trans R Soc Lond B Biol Sci. 2022 Oct 10;377(1861):20210244. doi: 10.1098/rstb.2021.0244. Epub 2022 Aug 22.

DOI:10.1098/rstb.2021.0244

PMID:35989607

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9393559/

Abstract

With the increased availability of sequence data and even of fully sequenced and assembled genomes, phylogeny estimation of very large trees (even of hundreds of thousands of sequences) is now a goal for some biologists. Yet, the construction of these phylogenies is a complex pipeline presenting analytical and computational challenges, especially when the number of sequences is very large. In the past few years, new methods have been developed that aim to enable highly accurate phylogeny estimations on these large datasets, including divide-and-conquer techniques for multiple sequence alignment and/or tree estimation, methods that can estimate species trees from multi-locus datasets while addressing heterogeneity due to biological processes (e.g. incomplete lineage sorting and gene duplication and loss), and methods to add sequences into large gene trees or species trees. Here we present some of these recent advances and discuss opportunities for future improvements. This article is part of a discussion meeting issue 'Genomic population structures of microbial pathogens'.

摘要

随着序列数据甚至完整测序和组装基因组的可用性的增加，现在一些生物学家的目标是对非常大的树（甚至数十万条序列）进行系统发育估计。然而，这些系统发育的构建是一个复杂的流程，提出了分析和计算方面的挑战，特别是当序列数量非常大时。在过去的几年中，已经开发了新的方法，旨在能够在这些大型数据集上进行高度准确的系统发育估计，包括用于多序列比对和/或树估计的分而治之技术、能够从多基因座数据集估计种系树的方法，同时解决由于生物过程（例如不完全谱系分选和基因复制和丢失）引起的异质性的方法，以及将序列添加到大基因树或种系树中的方法。在这里，我们介绍了其中的一些最新进展，并讨论了未来改进的机会。本文是“微生物病原体的基因组种群结构”讨论会议议题的一部分。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da23/9393559/6e4e6fde61ef/rstb20210244f01.jpg

相似文献

Recent progress on methods for estimating and updating large phylogenies.关于估计和更新大型系统发育树的方法的最新进展。

Philos Trans R Soc Lond B Biol Sci. 2022 Oct 10;377(1861):20210244. doi: 10.1098/rstb.2021.0244. Epub 2022 Aug 22.

Fast and accurate methods for phylogenomic analyses.用于系统基因组分析的快速而准确的方法。

BMC Bioinformatics. 2011 Oct 5;12 Suppl 9(Suppl 9):S4. doi: 10.1186/1471-2105-12-S9-S4.

Phylogeny Estimation Given Sequence Length Heterogeneity.给定序列长度异质性的系统发育估计。

Syst Biol. 2021 Feb 10;70(2):268-282. doi: 10.1093/sysbio/syaa058.

To Include or Not to Include: The Impact of Gene Filtering on Species Tree Estimation Methods.包含还是不包含：基因过滤对物种树估计方法的影响。

Syst Biol. 2018 Mar 1;67(2):285-303. doi: 10.1093/sysbio/syx077.

SATe-II: very fast and accurate simultaneous estimation of multiple sequence alignments and phylogenetic trees.SATe-II：一种非常快速且准确的同时估计多个序列比对和系统发育树的方法。

Syst Biol. 2012 Jan;61(1):90-106. doi: 10.1093/sysbio/syr095. Epub 2011 Dec 1.

DACTAL: divide-and-conquer trees (almost) without alignments.DACTAL：无需对齐的分而治之树（几乎）。

Bioinformatics. 2012 Jun 15;28(12):i274-82. doi: 10.1093/bioinformatics/bts218.

A scalable analytical approach from bacterial genomes to epidemiology.从细菌基因组到流行病学的可扩展分析方法。

Philos Trans R Soc Lond B Biol Sci. 2022 Oct 10;377(1861):20210246. doi: 10.1098/rstb.2021.0246. Epub 2022 Aug 22.

Estimating optimal species trees from incomplete gene trees under deep coalescence.在深度溯祖情况下从不完整基因树估计最优物种树。

J Comput Biol. 2012 Jun;19(6):591-605. doi: 10.1089/cmb.2012.0037.

Disk covering methods improve phylogenomic analyses.磁盘覆盖方法改进了系统发育基因组学分析。

BMC Genomics. 2014;15 Suppl 6(Suppl 6):S7. doi: 10.1186/1471-2164-15-S6-S7. Epub 2014 Oct 17.

Polynomial-Time Statistical Estimation of Species Trees Under Gene Duplication and Loss.多项式时间下基因重复和缺失下种系树的统计估计

J Comput Biol. 2021 May;28(5):452-468. doi: 10.1089/cmb.2020.0424. Epub 2020 Dec 15.

引用本文的文献

Testing Phylogenetic Placement Accuracy of DNA Barcode Sequences on a Fish Backbone Tree: Implications of Backbone Tree Completeness and Species Representation.测试鱼类主干树上DNA条形码序列的系统发育定位准确性：主干树完整性和物种代表性的影响

Ecol Evol. 2025 Jan 7;15(1):e70817. doi: 10.1002/ece3.70817. eCollection 2025 Jan.

Sparse Neighbor Joining: rapid phylogenetic inference using a sparse distance matrix.稀疏邻接法：使用稀疏距离矩阵进行快速系统发育推断。

Bioinformatics. 2024 Nov 28;40(12). doi: 10.1093/bioinformatics/btae701.

Mottle: Accurate pairwise substitution distance at high divergence through the exploitation of short-read mappers and gradient descent.斑驳：通过利用短读映射器和梯度下降实现高分歧下精确的双序列替换距离。

PLoS One. 2024 Mar 21;19(3):e0298834. doi: 10.1371/journal.pone.0298834. eCollection 2024.

Target capture and genome skimming for plant diversity studies.用于植物多样性研究的目标捕获和基因组浅层测序

Appl Plant Sci. 2023 Aug 10;11(4):e11537. doi: 10.1002/aps3.11537. eCollection 2023 Jul-Aug.

EnteroBase: hierarchical clustering of 100 000s of bacterial genomes into species/subspecies and populations.EnteroBase：将数万个细菌基因组按种/亚种和种群进行层次聚类。

Philos Trans R Soc Lond B Biol Sci. 2022 Oct 10;377(1861):20210240. doi: 10.1098/rstb.2021.0240. Epub 2022 Aug 22.

Genomic population structures of microbial pathogens.微生物病原体的基因组群体结构。

Philos Trans R Soc Lond B Biol Sci. 2022 Oct 10;377(1861):20210230. doi: 10.1098/rstb.2021.0230. Epub 2022 Aug 22.

本文引用的文献

EnteroBase: hierarchical clustering of 100 000s of bacterial genomes into species/subspecies and populations.EnteroBase：将数万个细菌基因组按种/亚种和种群进行层次聚类。

Philos Trans R Soc Lond B Biol Sci. 2022 Oct 10;377(1861):20210240. doi: 10.1098/rstb.2021.0240. Epub 2022 Aug 22.

Mandrake: visualizing microbial population structure by embedding millions of genomes into a low-dimensional representation.麻叶千里光：通过将数百万个基因组嵌入到低维表示中，可视化微生物种群结构。

Philos Trans R Soc Lond B Biol Sci. 2022 Oct 10;377(1861):20210237. doi: 10.1098/rstb.2021.0237. Epub 2022 Aug 22.

NetRAX: accurate and fast maximum likelihood phylogenetic network inference.NetRAX：准确快速的最大似然系统发育网络推断。

Bioinformatics. 2022 Aug 2;38(15):3725-3733. doi: 10.1093/bioinformatics/btac396.

Classes of explicit phylogenetic networks and their biological and mathematical significance.显式系统发育网络的分类及其生物学和数学意义。

J Math Biol. 2022 May 3;84(6):47. doi: 10.1007/s00285-022-01746-y.

DEPP: Deep Learning Enables Extending Species Trees using Single Genes.DEPP：深度学习可利用单基因拓展物种树。

Syst Biol. 2023 May 19;72(1):17-34. doi: 10.1093/sysbio/syac031.

SCAMPP: Scaling Alignment-Based Phylogenetic Placement to Large Trees.SCAMPP：将基于比对的系统发育定位扩展到大型树

IEEE/ACM Trans Comput Biol Bioinform. 2023 Mar-Apr;20(2):1417-1430. doi: 10.1109/TCBB.2022.3170386. Epub 2023 Apr 3.

Scalable Species Tree Inference with External Constraints.可扩展的带外部约束的种系发生树推断。

J Comput Biol. 2022 Jul;29(7):664-678. doi: 10.1089/cmb.2021.0543. Epub 2022 Feb 21.

SpeciesRax: A Tool for Maximum Likelihood Species Tree Inference from Gene Family Trees under Duplication, Transfer, and Loss.SpeciesRax：一种用于在基因家族树中进行复制、转移和丢失的最大似然种系发生树推断的工具。

Mol Biol Evol. 2022 Feb 3;39(2). doi: 10.1093/molbev/msab365.

Fast and accurate bootstrap confidence limits on genome-scale phylogenies using little bootstraps.使用少量自展法对基因组规模系统发育树进行快速准确的自展置信区间估计。

Nat Comput Sci. 2021 Sep;1(9):573-577. doi: 10.1038/s43588-021-00129-5. Epub 2021 Sep 22.

Fast and accurate distance-based phylogenetic placement using divide and conquer.基于划分与征服的快速准确基于距离的系统发育定位方法

Mol Ecol Resour. 2022 Apr;22(3):1213-1227. doi: 10.1111/1755-0998.13527. Epub 2021 Oct 26.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

关于估计和更新大型系统发育树的方法的最新进展。

Recent progress on methods for estimating and updating large phylogenies.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献