Ruano-Rubio Valentin, Fares Mario A
Molecular Evolution and Bioinformatics Laboratory, Department of Biology, National University of Ireland, Maynooth, Ireland.
Syst Biol. 2007 Feb;56(1):68-82. doi: 10.1080/10635150601175578.
Despite the advances in understanding molecular evolution, current phylogenetic methods barely take account of a fraction of the complexity of evolution. We are chiefly constrained by our incomplete knowledge of molecular evolutionary processes and the limits of computational power. These limitations lead to the establishment of either biologically simplistic models that rarely account for a fraction of the complexity involved or overfitting models that add little resolution to the problem. Such oversimplified models may lead us to assign high confidence to an incorrect tree (inconsistency). Rate-across-site (RAS) models are commonly used evolutionary models in phylogenetic studies. These account for heterogeneity in the evolutionary rates among sites but do not account for changing within-site rates across lineages (heterotachy). If heterotachy is common, using RAS models may lead to systematic errors in tree inference. In this work we show possible misleading effects in tree inference when the assumption of constant within-site rates across lineages is violated using maximum likelihood. Using a simulation study, we explore the ways in which gamma stationary models can lead to wrong topology or to deceptive bootstrap support values when the within-site rates change across lineages. More precisely, we show that different degrees of heterotachy mislead phylogenetic inference when the model assumed is stationary. Finally, we propose a geometry-based approach to visualize and to test for the possible existence of bias due to heterotachy.
尽管在理解分子进化方面取得了进展,但当前的系统发育方法几乎没有考虑到进化复杂性的一小部分。我们主要受到对分子进化过程的不完全了解以及计算能力限制的约束。这些限制导致建立要么是生物学上过于简单的模型,很少考虑到所涉及复杂性的一小部分,要么是过度拟合的模型,对问题几乎没有增加分辨率。这种过于简化的模型可能会导致我们对错误的树赋予高置信度(不一致性)。位点间速率(RAS)模型是系统发育研究中常用的进化模型。这些模型考虑了位点间进化速率的异质性,但没有考虑跨谱系的位点内速率变化(异速进化)。如果异速进化很常见,使用RAS模型可能会导致树推断中的系统误差。在这项工作中,我们展示了在违反跨谱系位点内速率恒定的假设时,使用最大似然法进行树推断可能产生的误导性影响。通过模拟研究,我们探讨了在位点内速率跨谱系变化时,伽马平稳模型可能导致错误拓扑或欺骗性自展支持值的方式。更确切地说,我们表明当假设的模型是平稳的时,不同程度的异速进化会误导系统发育推断。最后,我们提出一种基于几何的方法来可视化并测试由于异速进化可能存在的偏差。