• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

matOptimize:一种并行树优化方法,支持 SARS-CoV-2 的在线系统发生分析。

matOptimize: a parallel tree optimization method enables online phylogenetics for SARS-CoV-2.

机构信息

Department of Electrical and Computer Engineering, University of California, San Diego, San Diego, CA 92093, USA.

Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA 95064, USA.

出版信息

Bioinformatics. 2022 Aug 2;38(15):3734-3740. doi: 10.1093/bioinformatics/btac401.

DOI:10.1093/bioinformatics/btac401
PMID:35731204
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9344837/
Abstract

MOTIVATION

Phylogenetic tree optimization is necessary for precise analysis of evolutionary and transmission dynamics, but existing tools are inadequate for handling the scale and pace of data produced during the coronavirus disease 2019 (COVID-19) pandemic. One transformative approach, online phylogenetics, aims to incrementally add samples to an ever-growing phylogeny, but there are no previously existing approaches that can efficiently optimize this vast phylogeny under the time constraints of the pandemic.

RESULTS

Here, we present matOptimize, a fast and memory-efficient phylogenetic tree optimization tool based on parsimony that can be parallelized across multiple CPU threads and nodes, and provides orders of magnitude improvement in runtime and peak memory usage compared to existing state-of-the-art methods. We have developed this method particularly to address the pressing need during the COVID-19 pandemic for daily maintenance and optimization of a comprehensive SARS-CoV-2 phylogeny. matOptimize is currently helping refine on a daily basis possibly the largest-ever phylogenetic tree, containing millions of SARS-CoV-2 sequences.

AVAILABILITY AND IMPLEMENTATION

The matOptimize code is freely available as part of the UShER package (https://github.com/yatisht/usher) and can also be installed via bioconda (https://bioconda.github.io/recipes/usher/README.html). All scripts we used to perform the experiments in this manuscript are available at https://github.com/yceh/matOptimize-experiments.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

系统发育树优化对于精确分析进化和传播动态是必要的,但现有的工具不足以处理在 2019 年冠状病毒病(COVID-19)大流行期间产生的数据的规模和速度。一种变革性的方法是在线系统发生学,旨在将样本逐步添加到不断增长的系统发生树上,但以前没有任何方法可以在大流行的时间限制下有效地优化这个庞大的系统发生树。

结果

在这里,我们提出了 matOptimize,这是一种基于简约的快速且内存高效的系统发育树优化工具,可以跨多个 CPU 线程和节点并行化,并与现有最先进的方法相比,在运行时间和峰值内存使用方面提供了数量级的改进。我们特别开发了这种方法,以解决在 COVID-19 大流行期间对日常维护和优化综合 SARS-CoV-2 系统发生树的迫切需求。matOptimize 目前每天都在帮助优化可能是有史以来最大的系统发生树,其中包含数百万个 SARS-CoV-2 序列。

可用性和实施

matOptimize 代码作为 UShER 包的一部分免费提供(https://github.com/yatisht/usher),也可以通过 bioconda 安装(https://bioconda.github.io/recipes/usher/README.html)。本文中我们用于执行实验的所有脚本都可在 https://github.com/yceh/matOptimize-experiments 上获得。

补充信息

补充数据可在生物信息学在线获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1e36/9344837/21f34eebb6f6/btac401f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1e36/9344837/da7797a8afea/btac401f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1e36/9344837/8a2d527e8bed/btac401f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1e36/9344837/21f34eebb6f6/btac401f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1e36/9344837/da7797a8afea/btac401f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1e36/9344837/8a2d527e8bed/btac401f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1e36/9344837/21f34eebb6f6/btac401f3.jpg

相似文献

1
matOptimize: a parallel tree optimization method enables online phylogenetics for SARS-CoV-2.matOptimize:一种并行树优化方法,支持 SARS-CoV-2 的在线系统发生分析。
Bioinformatics. 2022 Aug 2;38(15):3734-3740. doi: 10.1093/bioinformatics/btac401.
2
Online Phylogenetics with matOptimize Produces Equivalent Trees and is Dramatically More Efficient for Large SARS-CoV-2 Phylogenies than de novo and Maximum-Likelihood Implementations.在线系统发育学与 matOptimize 产生等效的树,并且比从头开始和最大似然实现对大型 SARS-CoV-2 系统发育更有效率。
Syst Biol. 2023 Nov 1;72(5):1039-1051. doi: 10.1093/sysbio/syad031.
3
A Daily-Updated Database and Tools for Comprehensive SARS-CoV-2 Mutation-Annotated Trees.每日更新的 SARS-CoV-2 突变注释树综合数据库和工具。
Mol Biol Evol. 2021 Dec 9;38(12):5819-5824. doi: 10.1093/molbev/msab264.
4
Pandemic-scale phylogenetics.大流行规模的系统发育学。
bioRxiv. 2021 Dec 6:2021.12.03.470766. doi: 10.1101/2021.12.03.470766.
5
Taxonium, a web-based tool for exploring large phylogenetic trees.Taxonium,一个用于探索大型系统发育树的网络工具。
Elife. 2022 Nov 15;11:e82392. doi: 10.7554/eLife.82392.
6
Ultrafast Sample Placement on Existing Trees (UShER) Empowers Real-Time Phylogenetics for the SARS-CoV-2 Pandemic.用于现有谱系的超快速样本定位(UShER)助力SARS-CoV-2疫情的实时系统发育分析
bioRxiv. 2020 Sep 28:2020.09.26.314971. doi: 10.1101/2020.09.26.314971.
7
TopHap: rapid inference of key phylogenetic structures from common haplotypes in large genome collections with limited diversity.TopHap:从具有有限多样性的大型基因组集中的常见单倍型中快速推断关键系统发育结构。
Bioinformatics. 2022 May 13;38(10):2719-2726. doi: 10.1093/bioinformatics/btac186.
8
Phytest: quality control for phylogenetic analyses.Phytest:系统发育分析的质量控制。
Bioinformatics. 2022 Nov 15;38(22):5124-5125. doi: 10.1093/bioinformatics/btac664.
9
Ultrafast Sample placement on Existing tRees (UShER) enables real-time phylogenetics for the SARS-CoV-2 pandemic.超快现有树木样本放置 (UShER) 可实现 SARS-CoV-2 大流行的实时系统发生学。
Nat Genet. 2021 Jun;53(6):809-816. doi: 10.1038/s41588-021-00862-7. Epub 2021 May 10.
10
A daily-updated database and tools for comprehensive SARS-CoV-2 mutation-annotated trees.一个每日更新的数据库及用于全面的严重急性呼吸综合征冠状病毒2(SARS-CoV-2)突变注释树的工具。
bioRxiv. 2021 Jul 13:2021.04.03.438321. doi: 10.1101/2021.04.03.438321.

引用本文的文献

1
UShER-TB: Scalable, Comprehensive, Accessible Phylogenomic Analysis of .UShER-TB:可扩展、全面且可访问的系统发育基因组分析……(原文不完整)
medRxiv. 2025 Jul 23:2025.07.22.25331806. doi: 10.1101/2025.07.22.25331806.
2
Analysis-ready VCF at Biobank scale using Zarr.使用Zarr在生物样本库规模上生成可供分析的VCF。
Gigascience. 2025 Jan 6;14. doi: 10.1093/gigascience/giaf049.
3
Epidemic-induced local awareness behavior inferred from surveys and genetic sequence data.从调查和基因序列数据推断出的疫情引发的局部认知行为。

本文引用的文献

1
Online Phylogenetics with matOptimize Produces Equivalent Trees and is Dramatically More Efficient for Large SARS-CoV-2 Phylogenies than de novo and Maximum-Likelihood Implementations.在线系统发育学与 matOptimize 产生等效的树,并且比从头开始和最大似然实现对大型 SARS-CoV-2 系统发育更有效率。
Syst Biol. 2023 Nov 1;72(5):1039-1051. doi: 10.1093/sysbio/syad031.
2
Identifying SARS-CoV-2 regional introductions and transmission clusters in real time.实时识别严重急性呼吸综合征冠状病毒2(SARS-CoV-2)的区域引入情况和传播集群。
Virus Evol. 2022 Jun 16;8(1):veac048. doi: 10.1093/ve/veac048. eCollection 2022.
3
Assessment of Inter-Laboratory Differences in SARS-CoV-2 Consensus Genome Assemblies between Public Health Laboratories in Australia.
Nat Commun. 2025 May 22;16(1):4758. doi: 10.1038/s41467-025-59508-5.
4
CompactTree: a lightweight header-only C++ library and Python wrapper for ultra-large phylogenetics.CompactTree:一个轻量级的仅包含头文件的C++库以及用于超大型系统发育学的Python包装器。
GigaByte. 2025 Mar 7;2025:gigabyte152. doi: 10.46471/gigabyte.152. eCollection 2025.
5
The lives of cells, recorded.细胞的生命,被记录下来。
Nat Rev Genet. 2025 Mar;26(3):203-222. doi: 10.1038/s41576-024-00788-w. Epub 2024 Nov 25.
6
Phylogenetic Tree Instability After Taxon Addition: Empirical Frequency, Predictability, and Consequences For Online Inference.分类群添加后的系统发育树不稳定性:在线推断的经验频率、可预测性及后果
Syst Biol. 2025 Feb 10;74(1):101-111. doi: 10.1093/sysbio/syae059.
7
F1ALA: ultrafast and memory-efficient ancestral lineage annotation applied to the huge SARS-CoV-2 phylogeny.F1ALA:应用于庞大的新冠病毒系统发育树的超快速且内存高效的祖先谱系注释
Virus Evol. 2024 Jul 25;10(1):veae056. doi: 10.1093/ve/veae056. eCollection 2024.
8
Efficient phylogenetic tree inference for massive taxonomic datasets: harnessing the power of a server to analyze 1 million taxa.针对海量分类数据集的高效系统发育树推断:利用服务器的能力分析100万个分类单元。
Gigascience. 2024 Jan 2;13. doi: 10.1093/gigascience/giae055.
9
Analysis-ready VCF at Biobank scale using Zarr.使用Zarr在生物样本库规模上生成可用于分析的VCF。
bioRxiv. 2025 Feb 6:2024.06.11.598241. doi: 10.1101/2024.06.11.598241.
10
Please Mind the Gap: Indel-Aware Parsimony for Fast and Accurate Ancestral Sequence Reconstruction and Multiple Sequence Alignment Including Long Indels.请注意间隙:包括长插入缺失的快速准确的祖先序列重建和多序列比对的缺失感知简约性。
Mol Biol Evol. 2024 Jul 3;41(7). doi: 10.1093/molbev/msae109.
评估澳大利亚公共卫生实验室间 SARS-CoV-2 共识基因组组装的差异。
Viruses. 2022 Jan 19;14(2):185. doi: 10.3390/v14020185.
4
CoV-Spectrum: analysis of globally shared SARS-CoV-2 data to identify and characterize new variants.CoV-Spectrum:对全球共享的 SARS-CoV-2 数据进行分析,以识别和描述新的变体。
Bioinformatics. 2022 Mar 4;38(6):1735-1737. doi: 10.1093/bioinformatics/btab856.
5
Efficient Incremental Character Optimization.高效增量字符优化
Cladistics. 1997 Mar;13(1-2):21-26. doi: 10.1111/j.1096-0031.1997.tb00239.x.
6
METHODS FOR FASTER PARSIMONY ANALYSIS.更快简约分析的方法。
Cladistics. 1996 Sep;12(3):199-220. doi: 10.1111/j.1096-0031.1996.tb00009.x.
7
Analyzing Large Data Sets in Reasonable Times: Solutions for Composite Optima.在合理时间内分析大型数据集:复合最优解的解决方案。
Cladistics. 1999 Dec;15(4):415-428. doi: 10.1111/j.1096-0031.1999.tb00278.x.
8
Template switching and duplications in SARS-CoV-2 genomes give rise to insertion variants that merit monitoring.SARS-CoV-2 基因组中的模板转换和重复导致了值得监测的插入变异体。
Commun Biol. 2021 Nov 30;4(1):1343. doi: 10.1038/s42003-021-02858-9.
9
TNT version 1.5, including a full implementation of phylogenetic morphometrics.TNT版本1.5,包括系统发育形态计量学的完整实现。
Cladistics. 2016 Jun;32(3):221-238. doi: 10.1111/cla.12160. Epub 2016 Apr 25.
10
The UCSC Genome Browser database: 2022 update.UCSC 基因组浏览器数据库:2022 年更新。
Nucleic Acids Res. 2022 Jan 7;50(D1):D1115-D1122. doi: 10.1093/nar/gkab959.