Hurst L D, Williams E J
Department of Biology and Biochemistry, University of Bath, Claverton Down, Bath BA2 7AY, UK.
Gene. 2000 Dec 30;261(1):107-14. doi: 10.1016/s0378-1119(00)00489-3.
Many attempts to test selectionist and neutralist models employ estimates of synonymous (Ks) and non-synonymous (Ka) substitution rates of orthologous genes. For example, a stronger Ka-Ks correlation than expected under neutrality has been argued to indicate a role for selection and the absence of a Ks-GC4 correlation has been argued to be inconsistent with neutral models for isochore evolution. However, both of these results, we have shown previously, are sensitive to the method by which Ka and Ks are estimated. Using a maximum likelihood (ML) estimator (GY94) we found a positive correlation between Ks and GC4 and only a weak correlation between Ka and Ks, lower than expected under neutral expectations. This ML method is computationally slow. Recently, a new ad hoc approximation of this ML method has been provided (YN00). This is effectively an extension of Li's protocol but that also allows for codon usage bias. This method is computationally near-instantaneous and therefore potentially of great utility for analysis of large datasets. Here we ask whether this method might have such applicability. To this end we ask whether it too recovers the two unusual results. We report that when the ML and earlier ad hoc methods disagree, YN00 recovers the results described by the ML methods, i.e. a positive correlation between GC4 and Ks and only a weak correlation between Ks and Ka. If the ML method can be trusted, then YN00 can also be considered an adequately reliable method for analysis of large datasets. Assuming this to be so we also analyze further the patterns. We show, for example, that the positive correlation between GC4 and Ks is probably in part a mutational bias, there being more methyl induced CpG-->TpG mutations in GC rich regions. As regards the evolution of isochores, it seems inappropriate to use the claimed lack of a correlation between GC and Ks as definitive evidence either against or for any model. If the positive correlation is real then, we argue, this is hard to reconcile with the biased gene conversion model for isochore formation as this predicts a negative correlation.
许多用于检验选择主义和中性主义模型的尝试都采用了直系同源基因的同义替换率(Ks)和非同义替换率(Ka)估计值。例如,有人认为,在中性条件下,Ka-Ks相关性比预期更强,这表明选择发挥了作用;而Ks与GC4之间缺乏相关性,则被认为与等密度区进化的中性模型不一致。然而,正如我们之前所表明的,这两个结果都对Ka和Ks的估计方法很敏感。使用最大似然估计器(GY94),我们发现Ks与GC4之间存在正相关,而Ka与Ks之间只有微弱的相关性,低于中性预期下的预期值。这种最大似然方法计算速度很慢。最近,有人提出了这种最大似然方法的一种新的临时近似方法(YN00)。这实际上是李的方法的扩展,但也考虑了密码子使用偏好。这种方法计算几乎是即时的,因此对于大型数据集的分析可能非常有用。在这里,我们探讨这种方法是否具有这样的适用性。为此,我们询问它是否也能得出这两个异常结果。我们报告称,当最大似然方法和早期临时方法不一致时,YN00能得出最大似然方法所描述的结果,即GC4与Ks之间存在正相关,而Ks与Ka之间只有微弱的相关性。如果最大似然方法可信,那么YN00也可被视为分析大型数据集的一种足够可靠的方法。假设情况如此,我们还进一步分析了这些模式。例如,我们表明,GC4与Ks之间的正相关可能部分是由于突变偏好,在富含GC的区域甲基诱导的CpG→TpG突变更多。至于等密度区的进化,将声称的GC与Ks之间缺乏相关性作为反对或支持任何模型的确凿证据似乎并不合适。如果正相关是真实的,那么我们认为,这很难与等密度区形成的偏向基因转换模型相协调,因为该模型预测的是负相关。