Department of Biological Sciences, University of Texas at Dallas, Richardson, TX 75080.
Department of Mathematical Sciences, University of Texas at Dallas, Richardson, TX 75080.
Proc Natl Acad Sci U S A. 2020 Mar 17;117(11):5873-5882. doi: 10.1073/pnas.1913071117. Epub 2020 Mar 2.
We introduce a model of amino acid sequence evolution that accounts for the statistical behavior of real sequences induced by epistatic interactions. We base the model dynamics on parameters derived from multiple sequence alignments analyzed by using direct coupling analysis methodology. Known statistical properties such as overdispersion, heterotachy, and gamma-distributed rate-across-sites are shown to be emergent properties of this model while being consistent with neutral evolution theory, thereby unifying observations from previously disjointed evolutionary models of sequences. The relationship between site restriction and heterotachy is characterized by tracking the effective alphabet dynamics of sites. We also observe an evolutionary Stokes shift in the fitness of sequences that have undergone evolution under our simulation. By analyzing the structural information of some proteins, we corroborate that the strongest Stokes shifts derive from sites that physically interact in networks near biochemically important regions. Perspectives on the implementation of our model in the context of the molecular clock are discussed.
我们引入了一个氨基酸序列进化模型,该模型考虑了由上位相互作用引起的真实序列的统计行为。我们基于从使用直接耦合分析方法分析的多序列比对中得出的参数来构建模型动力学。所得到的模型具有分散性、异速现象和伽马分布的跨位率等已知统计性质,同时与中性进化理论一致,从而统一了先前不相关的序列进化模型的观察结果。通过跟踪有效字母表动态来描述位点限制和异速现象之间的关系。我们还观察到在我们的模拟下经历进化的序列的适应性进化 Stokes 频移。通过分析一些蛋白质的结构信息,我们证实了最强的 Stokes 频移源自于在生物化学上重要区域附近的网络中物理相互作用的位点。讨论了在分子钟背景下实现我们模型的观点。