Miyazawa Sanzo
6-5-607 Miyanodai, Sakura, Chiba 285-0857, Japan.
J Theor Biol. 2017 Nov 21;433:21-38. doi: 10.1016/j.jtbi.2017.08.018. Epub 2017 Aug 24.
Assuming that mutation and fixation processes are reversible Markov processes, we prove that the equilibrium ensemble of sequences obeys a Boltzmann distribution with exp(4Nm(1-1/(2N))), where m is Malthusian fitness and N and N are effective and actual population sizes. On the other hand, the probability distribution of sequences with maximum entropy that satisfies a given amino acid composition at each site and a given pairwise amino acid frequency at each site pair is a Boltzmann distribution with exp(-ψ), where the evolutionary statistical energy ψ is represented as the sum of one body (h) (compositional) and pairwise (J) (covariational) interactions over all sites and site pairs. A protein folding theory based on the random energy model (REM) indicates that the equilibrium ensemble of natural protein sequences is well represented by a canonical ensemble characterized by exp(-ΔG/kT) or by exp(-G/kT) if an amino acid composition is kept constant, where ΔG≡G-G,G and G are the native and denatured free energies, and T is the effective temperature representing the strength of selection pressure. Thus, 4Nm(1-1/(2N)),-Δψ(≡-ψ+ψ), and -ΔG/kT must be equivalent to each other. With h and J estimated by the DCA program, the changes (Δψ) of ψ due to single nucleotide nonsynonymous substitutions are analyzed. The results indicate that the standard deviation of ΔG(=kTΔψ) is approximately constant irrespective of protein families, and therefore can be used to estimate the relative value of T. Glass transition temperature T and ΔG are estimated from estimated T and experimental melting temperature (T) for 14 protein domains. The estimates of ΔG agree with their experimental values for 5 proteins, and those of T and T are all within a reasonable range. In addition, approximating the probability density function (PDF) of Δψ by a log-normal distribution, PDFs of Δψ and K/K, which is the ratio of nonsynonymous to synonymous substitution rate per site, in all and in fixed mutants are estimated. The equilibrium values of ψ, at which the average of Δψ in fixed mutants is equal to zero, well match ψ averaged over homologous sequences, confirming that the present methods for a fixation process of mutations and for the equilibrium ensemble of ψ give a consistent result with each other. The PDFs of K/K at equilibrium confirm that T negatively correlates with the amino acid substitution rate (the mean of K/K) of protein. Interestingly, stabilizing mutations are significantly fixed by positive selection, and balance with destabilizing mutations fixed by random drift, although most of them are removed from population. Supporting the nearly neutral theory, neutral selection is not significant even in fixed mutants.
假设突变和固定过程是可逆马尔可夫过程,我们证明序列的平衡系综服从玻尔兹曼分布,其指数为exp(4Nm(1 - 1/(2N))),其中m是马尔萨斯适应度,N和N分别是有效种群大小和实际种群大小。另一方面,在每个位点满足给定氨基酸组成且在每个位点对满足给定成对氨基酸频率的具有最大熵的序列的概率分布是指数为exp(-ψ)的玻尔兹曼分布,其中进化统计能量ψ表示为所有位点和位点对的单体(h)(组成)和成对(J)(协变)相互作用之和。基于随机能量模型(REM)的蛋白质折叠理论表明,如果氨基酸组成保持不变,天然蛋白质序列的平衡系综可以很好地由以exp(-ΔG/kT)或exp(-G/kT)为特征的正则系综表示,其中ΔG≡G - G,G和G分别是天然和变性自由能,T是表示选择压力强度的有效温度。因此,4Nm(1 - 1/(2N))、-Δψ(≡-ψ + ψ)和-ΔG/kT必须彼此等效。利用DCA程序估计h和J,分析了由于单核苷酸非同义替换导致的ψ的变化(Δψ)。结果表明,ΔG(= kTΔψ)的标准差与蛋白质家族无关,大致恒定,因此可用于估计T的相对值。从估计的T和14个蛋白质结构域的实验熔解温度(T)估计玻璃化转变温度T和ΔG。对于5种蛋白质,ΔG的估计值与其实验值相符,T和T的估计值均在合理范围内。此外,通过对数正态分布近似Δψ的概率密度函数(PDF),估计了所有和固定突变体中Δψ和K/K(每个位点非同义替换率与同义替换率之比)的PDF。固定突变体中Δψ的平均值等于零时的ψ的平衡值与同源序列上平均的ψ很好地匹配,证实了目前用于突变固定过程和ψ平衡系综的方法彼此给出一致的结果。平衡时K/K的PDF证实T与蛋白质的氨基酸替换率(K/K的平均值)呈负相关。有趣的是,稳定突变通过正选择显著固定,并与通过随机漂变固定的不稳定突变达到平衡,尽管其中大多数从种群中被去除。支持近中性理论的是,即使在固定突变体中,中性选择也不显著。