贝叶斯系统发育学中的自适应马尔可夫链蒙特卡罗方法：在BEAST中分析分区数据的应用

Adaptive MCMC in Bayesian phylogenetics: an application to analyzing partitioned data in BEAST.

作者信息

Baele Guy, Lemey Philippe, Rambaut Andrew, Suchard Marc A

机构信息

Department of Microbiology and Immunology, Rega Institute, KU Leuven, Leuven, Belgium.

Institute of Evolutionary Biology, University of Edinburgh, Edinburgh, UK.

出版信息

Bioinformatics. 2017 Jun 15;33(12):1798-1805. doi: 10.1093/bioinformatics/btx088.

DOI:10.1093/bioinformatics/btx088

PMID:28200071

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6044345/

Abstract

MOTIVATION

Advances in sequencing technology continue to deliver increasingly large molecular sequence datasets that are often heavily partitioned in order to accurately model the underlying evolutionary processes. In phylogenetic analyses, partitioning strategies involve estimating conditionally independent models of molecular evolution for different genes and different positions within those genes, requiring a large number of evolutionary parameters that have to be estimated, leading to an increased computational burden for such analyses. The past two decades have also seen the rise of multi-core processors, both in the central processing unit (CPU) and Graphics processing unit processor markets, enabling massively parallel computations that are not yet fully exploited by many software packages for multipartite analyses.

RESULTS

We here propose a Markov chain Monte Carlo (MCMC) approach using an adaptive multivariate transition kernel to estimate in parallel a large number of parameters, split across partitioned data, by exploiting multi-core processing. Across several real-world examples, we demonstrate that our approach enables the estimation of these multipartite parameters more efficiently than standard approaches that typically use a mixture of univariate transition kernels. In one case, when estimating the relative rate parameter of the non-coding partition in a heterochronous dataset, MCMC integration efficiency improves by > 14-fold.

AVAILABILITY AND IMPLEMENTATION

Our implementation is part of the BEAST code base, a widely used open source software package to perform Bayesian phylogenetic inference.

CONTACT

guy.baele@kuleuven.be.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

测序技术的进步不断产生越来越大的分子序列数据集，为了准确模拟潜在的进化过程，这些数据集通常被严重划分。在系统发育分析中，划分策略涉及为不同基因以及这些基因内的不同位置估计分子进化的条件独立模型，这需要估计大量的进化参数，从而增加了此类分析的计算负担。在过去二十年中，无论是在中央处理器（CPU）还是图形处理器市场，多核处理器都有所兴起，这使得大规模并行计算成为可能，但许多用于多部分分析的软件包尚未充分利用这一点。

结果

我们在此提出一种马尔可夫链蒙特卡罗（MCMC）方法，该方法使用自适应多元转移核，通过利用多核处理来并行估计大量参数，这些参数分布在划分的数据中。通过几个实际例子，我们证明我们的方法比通常使用单变量转移核混合的标准方法更有效地估计这些多部分参数。在一个案例中，当估计异时数据集中非编码分区的相对速率参数时，MCMC积分效率提高了14倍以上。

可用性和实现

我们的实现是BEAST代码库的一部分，BEAST是一个广泛使用的用于执行贝叶斯系统发育推断的开源软件包。

联系方式

guy.baele@kuleuven.be。

补充信息

补充数据可在《生物信息学》在线获取。

相似文献

Adaptive MCMC in Bayesian phylogenetics: an application to analyzing partitioned data in BEAST.贝叶斯系统发育学中的自适应马尔可夫链蒙特卡罗方法：在BEAST中分析分区数据的应用

Bioinformatics. 2017 Jun 15;33(12):1798-1805. doi: 10.1093/bioinformatics/btx088.

Posterior Summarization in Bayesian Phylogenetics Using Tracer 1.7.贝叶斯系统发生学中使用 Tracer 1.7 进行的后验总结

Syst Biol. 2018 Sep 1;67(5):901-904. doi: 10.1093/sysbio/syy032.

Bayesian phylogenetics with BEAUti and the BEAST 1.7.贝叶斯系统发育学与 BEAUTi 和 BEAST 1.7。

Mol Biol Evol. 2012 Aug;29(8):1969-73. doi: 10.1093/molbev/mss075. Epub 2012 Feb 25.

AWTY (are we there yet?): a system for graphical exploration of MCMC convergence in Bayesian phylogenetics.AWTY（我们到了吗？）：一种用于贝叶斯系统发育学中马尔可夫链蒙特卡罗收敛性图形化探索的系统。

Bioinformatics. 2008 Feb 15;24(4):581-3. doi: 10.1093/bioinformatics/btm388. Epub 2007 Aug 30.

High-Performance Computing in Bayesian Phylogenetics and Phylodynamics Using BEAGLE.使用BEAGLE在贝叶斯系统发育学和系统发育动力学中的高性能计算

Methods Mol Biol. 2019;1910:691-722. doi: 10.1007/978-1-4939-9074-0_23.

Estimation of evolutionary parameters using short, random and partial sequences from mixed samples of anonymous individuals.利用来自匿名个体混合样本的短的、随机的和部分序列估计进化参数。

BMC Bioinformatics. 2015 Nov 4;16:357. doi: 10.1186/s12859-015-0810-y.

Many-core algorithms for high-dimensional gradients on phylogenetic trees.用于系统发育树上高维梯度的多核算法。

Bioinformatics. 2024 Feb 1;40(2). doi: 10.1093/bioinformatics/btae030.

Real-Time and Remote MCMC Trace Inspection with Beastiary.使用 Beastiary 进行实时和远程 MCMC 跟踪检查。

Mol Biol Evol. 2022 May 3;39(5). doi: 10.1093/molbev/msac095.

BEAST: Bayesian evolutionary analysis by sampling trees.BEAST：通过抽样树进行贝叶斯进化分析。

BMC Evol Biol. 2007 Nov 8;7:214. doi: 10.1186/1471-2148-7-214.

Scalable Bayesian phylogenetics.可扩展的贝叶斯系统发生学。

Philos Trans R Soc Lond B Biol Sci. 2022 Oct 10;377(1861):20210242. doi: 10.1098/rstb.2021.0242. Epub 2022 Aug 22.

引用本文的文献

Evolution is coupled with branching across many granularities of life.进化与跨越生命诸多粒度层面的分支现象相互关联。

Proc Biol Sci. 2025 May;292(2047):20250182. doi: 10.1098/rspb.2025.0182. Epub 2025 May 28.

Spatiotemporal and Species-Crossing Transmission Dynamics of Subclade 2.3.4.4b H5Nx HPAIVs.2.3.4.4b亚分支H5Nx高致病性禽流感病毒的时空及跨物种传播动力学

Transbound Emerg Dis. 2024 Jul 10;2024:2862053. doi: 10.1155/2024/2862053. eCollection 2024.

Viral introductions and return to baseline sexual behaviors maintain low-level mpox incidence in Los Angeles County, USA, 2023-2024.2023 - 2024年，病毒传入及性行为回归基线导致美国洛杉矶县猴痘发病率维持在低水平。

medRxiv. 2025 Mar 15:2025.03.14.25323999. doi: 10.1101/2025.03.14.25323999.

Mediating role of systemic inflammation in the association between volatile organic compounds exposure and periodontitis: NHANES 2011-2014.挥发性有机化合物暴露与牙周炎的关系中系统性炎症的中介作用：NHANES 2011-2014。

BMC Oral Health. 2024 Oct 30;24(1):1324. doi: 10.1186/s12903-024-05110-y.

Data integration in Bayesian phylogenetics.贝叶斯系统发育学中的数据整合。

Annu Rev Stat Appl. 2023;10:353-377. doi: 10.1146/annurev-statistics-033021-112532. Epub 2022 Sep 28.

Local-scale phylodynamics reveal differential community impact of SARS-CoV-2 in a metropolitan US county.局部尺度系统发育动力学揭示了 SARS-CoV-2 在一个美国大都市区县对社区的差异化影响。

PLoS Pathog. 2024 Mar 26;20(3):e1012117. doi: 10.1371/journal.ppat.1012117. eCollection 2024 Mar.

Underdetected dispersal and extensive local transmission drove the 2022 mpox epidemic.未被检测到的传播和广泛的本地传播推动了2022年猴痘疫情的发展。

Cell. 2024 Mar 14;187(6):1374-1386.e13. doi: 10.1016/j.cell.2024.02.003. Epub 2024 Feb 29.

HetMM: A Michaelis-Menten model for non-homogeneous enzyme mixtures.HetMM：一种用于非均相酶混合物的米氏模型。

iScience. 2024 Jan 19;27(2):108977. doi: 10.1016/j.isci.2024.108977. eCollection 2024 Feb 16.

Genotype F of Echovirus 25 with multiple recombination pattern have been persistently and extensively circulating in Chinese mainland.基因型 F 的肠道病毒 25 型具有多种重组模式，在中国内地持续广泛流行。

Sci Rep. 2024 Feb 8;14(1):3212. doi: 10.1038/s41598-024-53513-2.

Early underdetected dissemination across countries followed by extensive local transmission propelled the 2022 mpox epidemic.2022年猴痘疫情是由早期未被发现的跨国传播，随后是广泛的本地传播推动的。

medRxiv. 2023 Dec 7:2023.07.27.23293266. doi: 10.1101/2023.07.27.23293266.

本文引用的文献

Bayesian evolutionary model testing in the phylogenomics era: matching model complexity with computational efficiency.贝叶斯进化模型测试在系统基因组学时代：使模型复杂性与计算效率相匹配。

Bioinformatics. 2013 Aug 15;29(16):1970-9. doi: 10.1093/bioinformatics/btt340. Epub 2013 Jun 12.

Accurate model selection of relaxed molecular clocks in bayesian phylogenetics.贝叶斯系统发生学中松弛分子钟模型的准确选择。

Mol Biol Evol. 2013 Feb;30(2):239-43. doi: 10.1093/molbev/mss243. Epub 2012 Oct 22.

Bayesian phylogenetics with BEAUti and the BEAST 1.7.贝叶斯系统发育学与 BEAUTi 和 BEAST 1.7。

Mol Biol Evol. 2012 Aug;29(8):1969-73. doi: 10.1093/molbev/mss075. Epub 2012 Feb 25.

MrBayes 3.2: efficient Bayesian phylogenetic inference and model choice across a large model space.MrBayes 3.2：在大型模型空间中进行高效的贝叶斯系统发育推断和模型选择。

Syst Biol. 2012 May;61(3):539-42. doi: 10.1093/sysbio/sys029. Epub 2012 Feb 22.

BEAGLE: an application programming interface and high-performance computing library for statistical phylogenetics.BEAGLE：一个用于统计系统发生学的应用程序编程接口和高性能计算库。

Syst Biol. 2012 Jan;61(1):170-3. doi: 10.1093/sysbio/syr100. Epub 2011 Oct 1.

Guided tree topology proposals for Bayesian phylogenetic inference.贝叶斯系统发育推断的引导树拓扑提议。

Syst Biol. 2012 Jan;61(1):1-11. doi: 10.1093/sysbio/syr074. Epub 2011 Aug 9.

Among-site rate variation and its impact on phylogenetic analyses.种间变异率及其对系统发育分析的影响。

Trends Ecol Evol. 1996 Sep;11(9):367-72. doi: 10.1016/0169-5347(96)10041-0.

Many-core algorithms for statistical phylogenetics.用于统计系统发育学的多核算法。

Bioinformatics. 2009 Jun 1;25(11):1370-6. doi: 10.1093/bioinformatics/btp244. Epub 2009 Apr 15.

Choosing appropriate substitution models for the phylogenetic analysis of protein-coding sequences.为蛋白质编码序列的系统发育分析选择合适的替代模型。

Mol Biol Evol. 2006 Jan;23(1):7-9. doi: 10.1093/molbev/msj021. Epub 2005 Sep 21.

Bayesian inference of phylogeny and its impact on evolutionary biology.系统发育的贝叶斯推断及其对进化生物学的影响。

Science. 2001 Dec 14;294(5550):2310-4. doi: 10.1126/science.1065889.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。