• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于系统发育树上高维梯度的多核算法。

Many-core algorithms for high-dimensional gradients on phylogenetic trees.

机构信息

Department of Biomathematics, David Geffen School of Medicine at UCLA, University of California, Los Angeles, Los Angeles, CA, United States.

Department of Mathematics, School of Science & Engineering, Tulane University, New Orleans, LA, United States.

出版信息

Bioinformatics. 2024 Feb 1;40(2). doi: 10.1093/bioinformatics/btae030.

DOI:10.1093/bioinformatics/btae030
PMID:38243701
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10868298/
Abstract

MOTIVATION

Advancements in high-throughput genomic sequencing are delivering genomic pathogen data at an unprecedented rate, positioning statistical phylogenetics as a critical tool to monitor infectious diseases globally. This rapid growth spurs the need for efficient inference techniques, such as Hamiltonian Monte Carlo (HMC) in a Bayesian framework, to estimate parameters of these phylogenetic models where the dimensions of the parameters increase with the number of sequences N. HMC requires repeated calculation of the gradient of the data log-likelihood with respect to (wrt) all branch-length-specific (BLS) parameters that traditionally takes O(N2) operations using the standard pruning algorithm. A recent study proposes an approach to calculate this gradient in O(N), enabling researchers to take advantage of gradient-based samplers such as HMC. The CPU implementation of this approach makes the calculation of the gradient computationally tractable for nucleotide-based models but falls short in performance for larger state-space size models, such as Markov-modulated and codon models. Here, we describe novel massively parallel algorithms to calculate the gradient of the log-likelihood wrt all BLS parameters that take advantage of graphics processing units (GPUs) and result in many fold higher speedups over previous CPU implementations.

RESULTS

We benchmark these GPU algorithms on three computing systems using three evolutionary inference examples exploring complete genomes from 997 dengue viruses, 62 carnivore mitochondria and 49 yeasts, and observe a >128-fold speedup over the CPU implementation for codon-based models and >8-fold speedup for nucleotide-based models. As a practical demonstration, we also estimate the timing of the first introduction of West Nile virus into the continental Unites States under a codon model with a relaxed molecular clock from 104 full viral genomes, an inference task previously intractable.

AVAILABILITY AND IMPLEMENTATION

We provide an implementation of our GPU algorithms in BEAGLE v4.0.0 (https://github.com/beagle-dev/beagle-lib), an open-source library for statistical phylogenetics that enables parallel calculations on multi-core CPUs and GPUs. We employ a BEAGLE-implementation using the Bayesian phylogenetics framework BEAST (https://github.com/beast-dev/beast-mcmc).

摘要

动机

高通量基因组测序的进步以前所未有的速度提供了基因组病原体数据,使统计系统发生学成为监测全球传染病的关键工具。这种快速增长促使人们需要高效的推断技术,例如贝叶斯框架中的哈密顿蒙特卡罗(HMC),以便估计这些系统发生模型的参数,其中参数的维度随着序列数 N 的增加而增加。HMC 需要反复计算数据对数似然相对于所有分支长度特定(BLS)参数的梯度,传统上使用标准修剪算法需要 O(N2) 次运算。最近的一项研究提出了一种在 O(N) 中计算此梯度的方法,使研究人员能够利用基于梯度的采样器,如 HMC。该方法的 CPU 实现使基于核苷酸的模型的梯度计算具有计算可行性,但在更大状态空间大小模型(如 Markov 调制和密码子模型)中性能欠佳。在这里,我们描述了利用图形处理单元(GPU)计算所有 BLS 参数的对数似然梯度的新的大规模并行算法,这些算法的速度比以前的 CPU 实现快许多倍。

结果

我们使用三个计算系统在三个进化推断示例中对这些 GPU 算法进行基准测试,这些示例探索了来自 997 种登革热病毒、62 种食肉动物线粒体和 49 种酵母的完整基因组,对于基于密码子的模型,我们观察到比 CPU 实现快 128 倍以上,对于基于核苷酸的模型则快 8 倍以上。作为实际演示,我们还根据放松分子钟的密码子模型,从 104 个完整病毒基因组中估计西尼罗河病毒首次引入美国大陆的时间,这是以前无法解决的推断任务。

可用性和实现

我们在 BEAGLE v4.0.0 中提供了我们的 GPU 算法的实现(https://github.com/beagle-dev/beagle-lib),这是一个用于统计系统发生学的开源库,它支持多核 CPU 和 GPU 上的并行计算。我们在 BEAST(https://github.com/beast-dev/beast-mcmc)的贝叶斯系统发生学框架中使用 BEAGLE 实现。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/89bd/10868298/de94109b4dee/btae030f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/89bd/10868298/c7a0ef92ae62/btae030f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/89bd/10868298/ce63abc9f9b7/btae030f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/89bd/10868298/f12e1e161d66/btae030f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/89bd/10868298/b44215b85a6c/btae030f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/89bd/10868298/de94109b4dee/btae030f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/89bd/10868298/c7a0ef92ae62/btae030f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/89bd/10868298/ce63abc9f9b7/btae030f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/89bd/10868298/f12e1e161d66/btae030f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/89bd/10868298/b44215b85a6c/btae030f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/89bd/10868298/de94109b4dee/btae030f5.jpg

相似文献

1
Many-core algorithms for high-dimensional gradients on phylogenetic trees.用于系统发育树上高维梯度的多核算法。
Bioinformatics. 2024 Feb 1;40(2). doi: 10.1093/bioinformatics/btae030.
2
Many-core algorithms for high-dimensional gradients on phylogenetic trees.用于系统发育树上高维梯度的多核算法。
ArXiv. 2023 Mar 8:arXiv:2303.04390v1.
3
Many-core algorithms for statistical phylogenetics.用于统计系统发育学的多核算法。
Bioinformatics. 2009 Jun 1;25(11):1370-6. doi: 10.1093/bioinformatics/btp244. Epub 2009 Apr 15.
4
Gradients Do Grow on Trees: A Linear-Time O(N)-Dimensional Gradient for Statistical Phylogenetics.梯度确实长在树上:统计系统发生学的一种线性时间 O(N)维梯度。
Mol Biol Evol. 2020 Oct 1;37(10):3047-3060. doi: 10.1093/molbev/msaa130.
5
BEAGLE 3: Improved Performance, Scaling, and Usability for a High-Performance Computing Library for Statistical Phylogenetics.BEAGLE 3:为统计系统发生学的高性能计算库提供改进的性能、可扩展性和可用性。
Syst Biol. 2019 Nov 1;68(6):1052-1061. doi: 10.1093/sysbio/syz020.
6
Bayesian Phylogenetic Analysis on Multi-Core Compute Architectures: Implementation and Evaluation of BEAGLE in RevBayes With MPI.多核计算架构上的贝叶斯系统发育分析:MPI 下 RevBayes 中 BEAGLE 的实现与评估。
Syst Biol. 2024 Jul 27;73(2):455-469. doi: 10.1093/sysbio/syae005.
7
High-Performance Computing in Bayesian Phylogenetics and Phylodynamics Using BEAGLE.使用BEAGLE在贝叶斯系统发育学和系统发育动力学中的高性能计算
Methods Mol Biol. 2019;1910:691-722. doi: 10.1007/978-1-4939-9074-0_23.
8
BEAGLE: an application programming interface and high-performance computing library for statistical phylogenetics.BEAGLE:一个用于统计系统发生学的应用程序编程接口和高性能计算库。
Syst Biol. 2012 Jan;61(1):170-3. doi: 10.1093/sysbio/syr100. Epub 2011 Oct 1.
9
Extending the BEAGLE library to a multi-FPGA platform.将 BEAGLE 库扩展到多 FPGA 平台。
BMC Bioinformatics. 2013 Jan 19;14:25. doi: 10.1186/1471-2105-14-25.
10
Adaptive MCMC in Bayesian phylogenetics: an application to analyzing partitioned data in BEAST.贝叶斯系统发育学中的自适应马尔可夫链蒙特卡罗方法:在BEAST中分析分区数据的应用
Bioinformatics. 2017 Jun 15;33(12):1798-1805. doi: 10.1093/bioinformatics/btx088.

引用本文的文献

1
Phylo-rs: an extensible phylogenetic analysis library in rust.Phylo-rs:一个用Rust编写的可扩展系统发育分析库。
BMC Bioinformatics. 2025 Jul 29;26(1):197. doi: 10.1186/s12859-025-06234-w.
2
Detecting Evolutionary Change-Points with Branch-Specific Substitution Models and Shrinkage Priors.使用特定分支替代模型和收缩先验检测进化变化点。
Res Sq. 2025 Jun 25:rs.3.rs-6926809. doi: 10.21203/rs.3.rs-6926809/v1.
3
Detecting Evolutionary Change-Points with Branch-Specific Substitution Models and Shrinkage Priors.使用特定分支替换模型和收缩先验检测进化变化点。

本文引用的文献

1
Massive Parallelization of Massive Sample-size Survival Analysis.大规模样本量生存分析的大规模并行化
J Comput Graph Stat. 2024;33(1):289-302. doi: 10.1080/10618600.2023.2213279. Epub 2023 Jun 26.
2
Automatic Differentiation is no Panacea for Phylogenetic Gradient Computation.自动微分并不是解决系统发育梯度计算的万能药。
Genome Biol Evol. 2023 Jun 1;15(6). doi: 10.1093/gbe/evad099.
3
Global disparities in SARS-CoV-2 genomic surveillance.全球 SARS-CoV-2 基因组监测的差异。
ArXiv. 2025 Jul 11:arXiv:2507.08386v1.
4
BEAST X for Bayesian phylogenetic, phylogeographic and phylodynamic inference.用于贝叶斯系统发育、系统地理学和系统动力学推断的BEAST X。
Nat Methods. 2025 Jul 7. doi: 10.1038/s41592-025-02751-x.
5
Random-Effects Substitution Models for Phylogenetics via Scalable Gradient Approximations.基于可扩展梯度逼近的系统发育学随机效应替换模型。
Syst Biol. 2024 Sep 5;73(3):562-578. doi: 10.1093/sysbio/syae019.
6
Antigenic distance between primary and secondary dengue infections correlates with disease risk.初次和再次登革热感染之间的抗原距离与疾病风险相关。
Sci Transl Med. 2024 Apr 24;16(744):eadk3259. doi: 10.1126/scitranslmed.adk3259.
7
Antigenic diversity and dengue disease risk.抗原多样性与登革热疾病风险。
Res Sq. 2023 Aug 2:rs.3.rs-3214507. doi: 10.21203/rs.3.rs-3214507/v1.
Nat Commun. 2022 Nov 16;13(1):7003. doi: 10.1038/s41467-022-33713-y.
4
West Nile Virus and Other Domestic Nationally Notifiable Arboviral Diseases - United States, 2020.西尼罗河病毒与其他国内法定报告虫媒病毒病 - 美国,2020 年。
MMWR Morb Mortal Wkly Rep. 2022 May 6;71(18):628-632. doi: 10.15585/mmwr.mm7118a3.
5
The next phase of SARS-CoV-2 surveillance: real-time molecular epidemiology.严重急性呼吸综合征冠状病毒2(SARS-CoV-2)监测的下一阶段:实时分子流行病学
Nat Med. 2021 Sep;27(9):1518-1524. doi: 10.1038/s41591-021-01472-w. Epub 2021 Sep 9.
6
Massive parallelization boosts big Bayesian multidimensional scaling.大规模并行化提升了大型贝叶斯多维缩放。
J Comput Graph Stat. 2021;30(1):11-24. doi: 10.1080/10618600.2020.1754226. Epub 2020 Jun 8.
7
Accommodating individual travel history and unsampled diversity in Bayesian phylogeographic inference of SARS-CoV-2.贝叶斯系统地理学推断 SARS-CoV-2 中考虑个体旅行史和未采样多样性。
Nat Commun. 2020 Oct 9;11(1):5110. doi: 10.1038/s41467-020-18877-9.
8
Relaxed Random Walks at Scale.大规模松弛随机游走。
Syst Biol. 2021 Feb 10;70(2):258-267. doi: 10.1093/sysbio/syaa056.
9
Gradients Do Grow on Trees: A Linear-Time O(N)-Dimensional Gradient for Statistical Phylogenetics.梯度确实长在树上:统计系统发生学的一种线性时间 O(N)维梯度。
Mol Biol Evol. 2020 Oct 1;37(10):3047-3060. doi: 10.1093/molbev/msaa130.
10
Markov-Modulated Continuous-Time Markov Chains to Identify Site- and Branch-Specific Evolutionary Variation in BEAST.马科夫调制连续时间马科夫链在 BEAST 中识别位点和分支特异性进化变化。
Syst Biol. 2021 Jan 1;70(1):181-189. doi: 10.1093/sysbio/syaa037.