Hilton Sarah K, Bloom Jesse D
Basic Sciences and Computational Biology Program, Fred Hutchinson Cancer Research Center.
Department of Genome Sciences, University of Washington, USA.
Virus Evol. 2018 Nov 6;4(2):vey033. doi: 10.1093/ve/vey033. eCollection 2018 Jul.
Molecular phylogenetics is often used to estimate the time since the divergence of modern gene sequences. For highly diverged sequences, such phylogenetic techniques sometimes estimate surprisingly recent divergence times. In the case of viruses, independent evidence indicates that the estimates of deep divergence times from molecular phylogenetics are sometimes too recent. This discrepancy is caused in part by inadequate models of purifying selection leading to branch-length underestimation. Here we examine the effect on branch-length estimation of using models that incorporate experimental measurements of purifying selection. We find that models informed by experimentally measured site-specific amino-acid preferences estimate longer deep branches on phylogenies of influenza virus hemagglutinin. This lengthening of branches is due to more realistic stationary states of the models, and is mostly independent of the branch-length extension from modeling site-to-site variation in amino-acid substitution rate. The branch-length extension from experimentally informed site-specific models is similar to that achieved by other approaches that allow the stationary state to vary across sites. However, the improvements from all of these site-specific but time homogeneous and site independent models are limited by the fact that a protein's amino-acid preferences gradually shift as it evolves. Overall, our work underscores the importance of modeling site-specific amino-acid preferences when estimating deep divergence times-but also shows the inherent limitations of approaches that fail to account for how these preferences shift over time.
分子系统发育学经常被用于估计现代基因序列分化以来的时间。对于高度分化的序列,这种系统发育技术有时会得出令人惊讶的近期分化时间估计。就病毒而言,独立证据表明,从分子系统发育学得出的深度分化时间估计有时过于近期。这种差异部分是由净化选择模型不充分导致分支长度低估造成的。在这里,我们研究了使用纳入净化选择实验测量的模型对分支长度估计的影响。我们发现,根据实验测量的位点特异性氨基酸偏好构建的模型,在流感病毒血凝素的系统发育树上估计出更长的深度分支。分支的这种延长是由于模型更现实的稳态,并且在很大程度上独立于因对氨基酸替代率的位点间变化进行建模而导致的分支长度延长。从实验得出的位点特异性模型导致的分支长度延长与通过其他允许稳态在位点间变化的方法所实现的延长相似。然而,所有这些位点特异性但时间齐次且位点独立的模型的改进都受到这样一个事实的限制,即蛋白质的氨基酸偏好在其进化过程中会逐渐发生变化。总体而言,我们的工作强调了在估计深度分化时间时对位点特异性氨基酸偏好进行建模的重要性——但也表明了未能考虑这些偏好如何随时间变化的方法存在的固有局限性。