Suppr超能文献

matOptimize:一种并行树优化方法,支持 SARS-CoV-2 的在线系统发生分析。

matOptimize: a parallel tree optimization method enables online phylogenetics for SARS-CoV-2.

机构信息

Department of Electrical and Computer Engineering, University of California, San Diego, San Diego, CA 92093, USA.

Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA 95064, USA.

出版信息

Bioinformatics. 2022 Aug 2;38(15):3734-3740. doi: 10.1093/bioinformatics/btac401.

Abstract

MOTIVATION

Phylogenetic tree optimization is necessary for precise analysis of evolutionary and transmission dynamics, but existing tools are inadequate for handling the scale and pace of data produced during the coronavirus disease 2019 (COVID-19) pandemic. One transformative approach, online phylogenetics, aims to incrementally add samples to an ever-growing phylogeny, but there are no previously existing approaches that can efficiently optimize this vast phylogeny under the time constraints of the pandemic.

RESULTS

Here, we present matOptimize, a fast and memory-efficient phylogenetic tree optimization tool based on parsimony that can be parallelized across multiple CPU threads and nodes, and provides orders of magnitude improvement in runtime and peak memory usage compared to existing state-of-the-art methods. We have developed this method particularly to address the pressing need during the COVID-19 pandemic for daily maintenance and optimization of a comprehensive SARS-CoV-2 phylogeny. matOptimize is currently helping refine on a daily basis possibly the largest-ever phylogenetic tree, containing millions of SARS-CoV-2 sequences.

AVAILABILITY AND IMPLEMENTATION

The matOptimize code is freely available as part of the UShER package (https://github.com/yatisht/usher) and can also be installed via bioconda (https://bioconda.github.io/recipes/usher/README.html). All scripts we used to perform the experiments in this manuscript are available at https://github.com/yceh/matOptimize-experiments.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

系统发育树优化对于精确分析进化和传播动态是必要的,但现有的工具不足以处理在 2019 年冠状病毒病(COVID-19)大流行期间产生的数据的规模和速度。一种变革性的方法是在线系统发生学,旨在将样本逐步添加到不断增长的系统发生树上,但以前没有任何方法可以在大流行的时间限制下有效地优化这个庞大的系统发生树。

结果

在这里,我们提出了 matOptimize,这是一种基于简约的快速且内存高效的系统发育树优化工具,可以跨多个 CPU 线程和节点并行化,并与现有最先进的方法相比,在运行时间和峰值内存使用方面提供了数量级的改进。我们特别开发了这种方法,以解决在 COVID-19 大流行期间对日常维护和优化综合 SARS-CoV-2 系统发生树的迫切需求。matOptimize 目前每天都在帮助优化可能是有史以来最大的系统发生树,其中包含数百万个 SARS-CoV-2 序列。

可用性和实施

matOptimize 代码作为 UShER 包的一部分免费提供(https://github.com/yatisht/usher),也可以通过 bioconda 安装(https://bioconda.github.io/recipes/usher/README.html)。本文中我们用于执行实验的所有脚本都可在 https://github.com/yceh/matOptimize-experiments 上获得。

补充信息

补充数据可在生物信息学在线获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1e36/9344837/da7797a8afea/btac401f1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验