Barido-Sottani Joëlle, Schwery Orlando, Warnock Rachel C M, Zhang Chi, Wright April Marie
Institut de Biologie de l'ENS (IBENS), École normale supérieure, CNRS, INSERM, Université PSL, Paris, Île-de-France, 75005, France.
Department of Biological Sciences, Southeastern Louisiana University, Hammond, Louisiana, 70402, USA.
Open Res Eur. 2024 Aug 5;3:204. doi: 10.12688/openreseurope.16679.1. eCollection 2023.
Phylogenetic estimation is, and has always been, a complex endeavor. Estimating a phylogenetic tree involves evaluating many possible solutions and possible evolutionary histories that could explain a set of observed data, typically by using a model of evolution. Values for all model parameters need to be evaluated as well. Modern statistical methods involve not just the estimation of a tree, but also solutions to more complex models involving fossil record information and other data sources. Markov chain Monte Carlo (MCMC) is a leading method for approximating the posterior distribution of parameters in a mathematical model. It is deployed in all Bayesian phylogenetic tree estimation software. While many researchers use MCMC in phylogenetic analyses, interpreting results and diagnosing problems with MCMC remain vexing issues to many biologists. In this manuscript, we will offer an overview of how MCMC is used in Bayesian phylogenetic inference, with a particular emphasis on complex hierarchical models, such as the fossilized birth-death (FBD) model. We will discuss strategies to diagnose common MCMC problems and troubleshoot difficult analyses, in particular convergence issues. We will show how the study design, the choice of models and priors, but also technical features of the inference tools themselves can all be adjusted to obtain the best results. Finally, we will also discuss the unique challenges created by the incorporation of fossil information in phylogenetic inference, and present tips to address them.
系统发育估计一直以来都是一项复杂的工作。估计系统发育树通常需要使用进化模型,对许多可能解释一组观测数据的进化历史以及许多可能的解决方案进行评估。所有模型参数的值也需要进行评估。现代统计方法不仅涉及树的估计,还涉及涉及化石记录信息和其他数据源的更复杂模型的解决方案。马尔可夫链蒙特卡罗(MCMC)是一种用于逼近数学模型中参数后验分布的主要方法。它被应用于所有贝叶斯系统发育树估计软件中。虽然许多研究人员在系统发育分析中使用MCMC,但对许多生物学家来说,解释结果和诊断MCMC问题仍然是令人烦恼的问题。在本手稿中,我们将概述MCMC在贝叶斯系统发育推断中的应用,特别强调复杂的层次模型,如化石出生-死亡(FBD)模型。我们将讨论诊断常见MCMC问题和解决困难分析(特别是收敛问题)的策略。我们将展示如何调整研究设计、模型和先验的选择以及推断工具本身的技术特征,以获得最佳结果。最后,我们还将讨论在系统发育推断中纳入化石信息所带来的独特挑战,并提出应对这些挑战的建议。