Department of Microbiology, Immunology and Transplantation, Rega Institute, KU Leuven, Herestraat 49, 3000 Leuven, Belgium.
Department of Biostatistics, Jonathan and Karin Fielding School of Public Health, University of California, Los Angeles, CA 90095, USA.
Syst Biol. 2021 Jan 1;70(1):181-189. doi: 10.1093/sysbio/syaa037.
Markov models of character substitution on phylogenies form the foundation of phylogenetic inference frameworks. Early models made the simplifying assumption that the substitution process is homogeneous over time and across sites in the molecular sequence alignment. While standard practice adopts extensions that accommodate heterogeneity of substitution rates across sites, heterogeneity in the process over time in a site-specific manner remains frequently overlooked. This is problematic, as evolutionary processes that act at the molecular level are highly variable, subjecting different sites to different selective constraints over time, impacting their substitution behavior. We propose incorporating time variability through Markov-modulated models (MMMs), which extend covarion-like models and allow the substitution process (including relative character exchange rates as well as the overall substitution rate) at individual sites to vary across lineages. We implement a general MMM framework in BEAST, a popular Bayesian phylogenetic inference software package, allowing researchers to compose a wide range of MMMs through flexible XML specification. Using examples from bacterial, viral, and plastid genome evolution, we show that MMMs impact phylogenetic tree estimation and can substantially improve model fit compared to standard substitution models. Through simulations, we show that marginal likelihood estimation accurately identifies the generative model and does not systematically prefer the more parameter-rich MMMs. To mitigate the increased computational demands associated with MMMs, our implementation exploits recent developments in BEAGLE, a high-performance computational library for phylogenetic inference. [Bayesian inference; BEAGLE; BEAST; covarion, heterotachy; Markov-modulated models; phylogenetics.].
系统发育中字符替换的马尔可夫模型构成了系统发育推断框架的基础。早期的模型做出了一个简化的假设,即替换过程在时间上和分子序列比对中的各个位置上都是均匀的。虽然标准实践采用了扩展方法来适应各位置替换率的异质性,但在特定位置随时间变化的过程中的异质性仍然经常被忽视。这是有问题的,因为在分子水平上起作用的进化过程是高度可变的,随着时间的推移,不同的位置会受到不同的选择压力,从而影响它们的替换行为。我们通过马尔可夫调制模型(MMM)来引入时间变异性,该模型扩展了协变模型,并允许各个位置的替换过程(包括相对字符交换率以及整体替换率)在谱系上发生变化。我们在 BEAST 中实现了一个通用的 MMM 框架,BEAST 是一个流行的贝叶斯系统发育推断软件包,允许研究人员通过灵活的 XML 规范来组合广泛的 MMM。通过细菌、病毒和质体基因组进化的例子,我们表明 MMM 会影响系统发育树估计,并且与标准替换模型相比,可以大大提高模型拟合度。通过模拟,我们表明边际似然估计准确地识别了生成模型,并且不会系统地偏好更具参数的 MMM。为了减轻与 MMM 相关的计算需求的增加,我们的实现利用了 BEAST 中的最新进展,BEAST 是一个用于系统发育推断的高性能计算库。[贝叶斯推断;BEAGLE;BEAST;协变,异速;马尔可夫调制模型;系统发生学。]