贝叶斯系统发生学中划分模型的选择。

Choosing among partition models in Bayesian phylogenetics.

机构信息

Department of Ecology and Evolutionary Biology, University of Connecticut.

出版信息

Mol Biol Evol. 2011 Jan;28(1):523-32. doi: 10.1093/molbev/msq224. Epub 2010 Aug 27.

DOI:10.1093/molbev/msq224

PMID:20801907

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3002242/

Abstract

Bayesian phylogenetic analyses often depend on Bayes factors (BFs) to determine the optimal way to partition the data. The marginal likelihoods used to compute BFs, in turn, are most commonly estimated using the harmonic mean (HM) method, which has been shown to be inaccurate. We describe a new more accurate method for estimating the marginal likelihood of a model and compare it with the HM method on both simulated and empirical data. The new method generalizes our previously described stepping-stone (SS) approach by making use of a reference distribution parameterized using samples from the posterior distribution. This avoids one challenging aspect of the original SS method, namely the need to sample from distributions that are close (in the Kullback-Leibler sense) to the prior. We specifically address the choice of partition models and find that using the HM method can lead to a strong preference for an overpartitioned model. In contrast to the HM method and the original SS method, we show using simulated data that the generalized SS method is strikingly more precise (repeatable BF values of the same data and partition model) and yields BF values that are much more reasonable than those produced by the HM method. Comparisons of HM and generalized SS methods on an empirical data set demonstrate that the generalized SS method tends to choose simpler partition schemes that are more in line with expectation based on inferred patterns of molecular evolution. The generalized SS method shares with thermodynamic integration the need to sample from a series of distributions in addition to the posterior. Such dedicated path-based Markov chain Monte Carlo analyses appear to be a cost of estimating marginal likelihoods accurately.

摘要

贝叶斯系统发育分析通常依赖贝叶斯因子（BFs）来确定划分数据的最佳方法。用于计算 BFs 的边际似然度，反过来，最常用调和均值（HM）方法进行估计，该方法已被证明不准确。我们描述了一种新的更准确的方法来估计模型的边际似然度，并在模拟和实际数据上比较了 HM 方法。该新方法通过使用基于后验分布样本参数化的参考分布来扩展我们之前描述的步石（SS）方法。这避免了原始 SS 方法的一个具有挑战性的方面，即需要从与先验分布接近（在 Kullback-Leibler 意义上）的分布中进行采样。我们特别针对分区模型的选择，并发现使用 HM 方法可能导致对过度分区模型的强烈偏好。与 HM 方法和原始 SS 方法相比，我们使用模拟数据表明，广义 SS 方法具有惊人的更高精度（相同数据和分区模型的重复 BF 值），并且产生的 BF 值比 HM 方法合理得多。在实际数据集上比较 HM 和广义 SS 方法表明，广义 SS 方法倾向于选择更简单的分区方案，这些方案更符合基于推断的分子进化模式的预期。广义 SS 方法与热力学集成共享，除了后验之外，还需要从一系列分布中进行采样。这种专门的基于路径的马尔可夫链蒙特卡罗分析似乎是准确估计边际似然度的代价。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/59f1/3002242/1004de083f60/molbiolevolmsq224f01_lw.jpg

相似文献

Choosing among partition models in Bayesian phylogenetics.贝叶斯系统发生学中划分模型的选择。

Mol Biol Evol. 2011 Jan;28(1):523-32. doi: 10.1093/molbev/msq224. Epub 2010 Aug 27.

Improving marginal likelihood estimation for Bayesian phylogenetic model selection.改进贝叶斯系统发育模型选择的边缘似然估计。

Syst Biol. 2011 Mar;60(2):150-60. doi: 10.1093/sysbio/syq085. Epub 2010 Dec 27.

Species delimitation using Bayes factors: simulations and application to the Sceloporus scalaris species group (Squamata: Phrynosomatidae).贝叶斯因子在物种界定中的应用：模拟及在钝尾毒蜥属（有鳞目：美洲鬣蜥科）中的应用。

Syst Biol. 2014 Mar;63(2):119-33. doi: 10.1093/sysbio/syt069. Epub 2013 Nov 20.

Genealogical Working Distributions for Bayesian Model Testing with Phylogenetic Uncertainty.用于具有系统发育不确定性的贝叶斯模型检验的系谱工作分布

Syst Biol. 2016 Mar;65(2):250-64. doi: 10.1093/sysbio/syv083. Epub 2015 Nov 1.

Improved harmonic mean estimator for phylogenetic model evidence.用于系统发育模型证据的改进调和均值估计器。

J Comput Biol. 2012 Apr;19(4):418-38. doi: 10.1089/cmb.2010.0139. Epub 2012 Mar 13.

Make the most of your samples: Bayes factor estimators for high-dimensional models of sequence evolution.充分利用你的样本：序列进化高维模型的贝叶斯因子估计量。

BMC Bioinformatics. 2013 Mar 6;14:85. doi: 10.1186/1471-2105-14-85.

LoRaD: Marginal likelihood estimation with haste (but no waste).LoRaD：仓促但不浪费地进行边际似然估计。

Syst Biol. 2023 Jun 17;72(3):639-648. doi: 10.1093/sysbio/syad007.

Accurate model selection of relaxed molecular clocks in bayesian phylogenetics.贝叶斯系统发生学中松弛分子钟模型的准确选择。

Mol Biol Evol. 2013 Feb;30(2):239-43. doi: 10.1093/molbev/mss243. Epub 2012 Oct 22.

Thermodynamic integration via differential evolution: A method for estimating marginal likelihoods.通过差分进化进行热力学集成：一种估计边缘似然的方法。

Behav Res Methods. 2019 Apr;51(2):930-947. doi: 10.3758/s13428-018-1172-y.

The devil in the details: interactions between the branch-length prior and likelihood model affect node support and branch lengths in the phylogeny of the Psoraceae.细节中的魔鬼：分支长度先验和似然模型之间的相互作用影响了 Psoraceae 系统发育中的节点支持和分支长度。

Syst Biol. 2011 Jul;60(4):541-61. doi: 10.1093/sysbio/syr022. Epub 2011 Mar 24.

引用本文的文献

Infinite Mixture Models for Improved Modeling of Across-Site Evolutionary Variation.用于改进跨位点进化变异建模的无限混合模型。

Mol Biol Evol. 2025 Jul 30;42(8). doi: 10.1093/molbev/msaf199.

Assessing the effect of model specification and prior sensitivity on Bayesian tests of temporal signal.评估模型规范和先验敏感性对时间信号贝叶斯检验的影响。

PLoS Comput Biol. 2024 Nov 6;20(11):e1012371. doi: 10.1371/journal.pcbi.1012371. eCollection 2024 Nov.

Mosaic evolution underlies feliform morphological disparity.镶嵌进化是猫型动物形态差异的基础。

Proc Biol Sci. 2024 Aug;291(2028):20240756. doi: 10.1098/rspb.2024.0756. Epub 2024 Aug 14.

A Guide to Phylogenomic Inference.系统发育基因组推断指南。

Methods Mol Biol. 2024;2802:267-345. doi: 10.1007/978-1-0716-3838-5_11.

Emergence and dissemination of equine-like G3P[8] rotavirus A in Brazil between 2015 and 2021.巴西 2015 年至 2021 年期间出现并传播马源 G3P[8]轮状病毒 A。

Microbiol Spectr. 2024 Apr 2;12(4):e0370923. doi: 10.1128/spectrum.03709-23. Epub 2024 Mar 7.

Global phylogenomic diversity of : spread of a dominant lineage.全球系统发育基因组多样性：优势谱系的传播。

Front Microbiol. 2023 Nov 29;14:1287046. doi: 10.3389/fmicb.2023.1287046. eCollection 2023.

Detecting Episodic Evolution through Bayesian Inference of Molecular Clock Models.通过贝叶斯推断分子钟模型检测阶段性进化。

Mol Biol Evol. 2023 Oct 4;40(10). doi: 10.1093/molbev/msad212.

Application of referenced thermodynamic integration to Bayesian model selection.参考文献热力学积分在贝叶斯模型选择中的应用。

PLoS One. 2023 Aug 14;18(8):e0289889. doi: 10.1371/journal.pone.0289889. eCollection 2023.

Inferring Human Immunodeficiency Virus 1 Proviral Integration Dates With Bayesian Inference.基于贝叶斯推断推断人类免疫缺陷病毒 1 前病毒整合日期。

Mol Biol Evol. 2023 Aug 3;40(8). doi: 10.1093/molbev/msad156.

Structure-Based GC Investigation Sheds New Light on ITS2 Evolution in Species.基于结构的 GC 分析为种内 ITS2 进化提供新线索。

Int J Mol Sci. 2023 Apr 23;24(9):7716. doi: 10.3390/ijms24097716.

本文引用的文献

Phycas: software for Bayesian phylogenetic analysis.Phycas：贝叶斯系统发育分析软件。

Syst Biol. 2015 May;64(3):525-31. doi: 10.1093/sysbio/syu132. Epub 2015 Jan 9.

Improving marginal likelihood estimation for Bayesian phylogenetic model selection.改进贝叶斯系统发育模型选择的边缘似然估计。

Syst Biol. 2011 Mar;60(2):150-60. doi: 10.1093/sysbio/syq085. Epub 2010 Dec 27.

Cryptic failure of partitioned Bayesian phylogenetic analyses: lost in the land of long trees.分区贝叶斯系统发育分析的隐式失败：在长树之地迷失。

Syst Biol. 2010 Jan;59(1):108-17. doi: 10.1093/sysbio/syp080. Epub 2009 Nov 17.

A time line of the environmental genetics of the haptophytes.甲藻的环境遗传学时间线。

Mol Biol Evol. 2010 Jan;27(1):161-76. doi: 10.1093/molbev/msp222.

Phylogeny of the "forgotten" cellular slime mold, Fonticula alba, reveals a key evolutionary branch within Opisthokonta.“被遗忘”的细胞黏菌——白网柄菌的系统发育揭示了后口动物门内的一个关键进化分支。

Mol Biol Evol. 2009 Dec;26(12):2699-709. doi: 10.1093/molbev/msp185. Epub 2009 Aug 19.

Mosaicism, modules, and the evolution of birds: results from a Bayesian approach to the study of morphological evolution using discrete character data.镶嵌性、模块与鸟类的演化：基于离散性状数据的贝叶斯形态演化研究方法的结果

Syst Biol. 2008 Apr;57(2):185-201. doi: 10.1080/10635150802022231.

A nonparametric method for accommodating and testing across-site rate variation.

Syst Biol. 2007 Dec;56(6):975-87. doi: 10.1080/10635150701670569.

The importance of data partitioning and the utility of Bayes factors in Bayesian phylogenetics.数据划分在贝叶斯系统发育学中的重要性以及贝叶斯因子的效用。

Syst Biol. 2007 Aug;56(4):643-55. doi: 10.1080/10635150701546249.

Accurate branch length estimation in partitioned Bayesian analyses requires accommodation of among-partition rate variation and attention to branch length priors.在分区贝叶斯分析中进行准确的分支长度估计，需要考虑分区间的速率变化并关注分支长度先验。

Syst Biol. 2006 Dec;55(6):993-1003. doi: 10.1080/10635150601087641.

Computing Bayes factors using thermodynamic integration.使用热力学积分计算贝叶斯因子。

Syst Biol. 2006 Apr;55(2):195-207. doi: 10.1080/10635150500433722.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

贝叶斯系统发生学中划分模型的选择。

Choosing among partition models in Bayesian phylogenetics.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献