Center for Computational Biology and Laboratory of Disease Genomics and Individualized Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100029, China.
Mol Biol Evol. 2012 Oct;29(10):2889-93. doi: 10.1093/molbev/mss104. Epub 2012 Apr 3.
The use of codon substitution models to compare synonymous and nonsynonymous substitution rates is a widely used approach to detecting positive Darwinian selection affecting protein evolution. However, in several recent papers, Hughes and colleagues claim that codon-based likelihood-ratio tests (LRTs) are logically flawed as they lack prior hypotheses and fail to accommodate random fluctuations in synonymous and nonsynonymous substitutions Friedman and Hughes (2007) also used site-based LRTs to analyze 605 gene families consisting of human and mouse paralogues. They found that the outcome of the tests was largely determined by irrelevant factors such as the GC content at the third codon positions and the synonymous rate d(S), but not by the nonsynonymous rate d(N) or the d(N)/d(S) ratio, factors that should be related to selection. Here, we reanalyze those data. Contra Friedman and Hughes, we found that the test results are related to sequence length and the average d(N)/d(S) ratio. We examine the criticisms of Hughes and suggest that they are based on misunderstandings of the codon models and on statistical errors. Our analyses suggest that codon-based tests are useful tools for comparative analysis of genomic data sets.
使用密码子替代模型来比较同义替换率和非同义替换率是一种广泛用于检测影响蛋白质进化的正向达尔文选择的方法。然而,在最近的几篇论文中,Hughes 及其同事声称基于密码子的似然比检验(LRT)在逻辑上存在缺陷,因为它们缺乏先验假设,并且无法适应同义替换和非同义替换的随机波动。Friedman 和 Hughes(2007 年)还使用基于位点的 LRT 分析了由人类和小鼠同源基因组成的 605 个基因家族。他们发现,检验的结果在很大程度上取决于无关因素,如第三密码子位置的 GC 含量和同义替换率 d(S),而不是非同义替换率 d(N)或 d(N)/d(S) 比值,这些因素应该与选择有关。在这里,我们重新分析了这些数据。与 Friedman 和 Hughes 相反,我们发现检验结果与序列长度和平均 d(N)/d(S) 比值有关。我们检查了 Hughes 的批评,并认为它们基于对密码子模型的误解和统计错误。我们的分析表明,基于密码子的检验是比较基因组数据集的有用工具。