Department of Ecology and Evolution, Biophore, Lausanne University, Lausanne, Switzerland.
Mol Biol Evol. 2013 Jul;30(7):1675-86. doi: 10.1093/molbev/mst062. Epub 2013 Apr 4.
Positive selection is widely estimated from protein coding sequence alignments by the nonsynonymous-to-synonymous ratio ω. Increasingly elaborate codon models are used in a likelihood framework for this estimation. Although there is widespread concern about the robustness of the estimation of the ω ratio, more efforts are needed to estimate this robustness, especially in the context of complex models. Here, we focused on the branch-site codon model. We investigated its robustness on a large set of simulated data. First, we investigated the impact of sequence divergence. We found evidence of underestimation of the synonymous substitution rate for values as small as 0.5, with a slight increase in false positives for the branch-site test. When dS increases further, underestimation of dS is worse, but false positives decrease. Interestingly, the detection of true positives follows a similar distribution, with a maximum for intermediary values of dS. Thus, high dS is more of a concern for a loss of power (false negatives) than for false positives of the test. Second, we investigated the impact of GC content. We showed that there is no significant difference of false positives between high GC (up to ∼80%) and low GC (∼30%) genes. Moreover, neither shifts of GC content on a specific branch nor major shifts in GC along the gene sequence generate many false positives. Our results confirm that the branch-site is a very conservative test.
正选择通常通过非同义替换与同义替换的比值ω,从蛋白质编码序列比对中进行估计。在似然框架中,越来越复杂的密码子模型被用于这种估计。尽管人们普遍关注ω比值估计的稳健性,但仍需要更多的努力来估计这种稳健性,特别是在复杂模型的背景下。在这里,我们关注的是分支位点密码子模型。我们在大量模拟数据上研究了它的稳健性。首先,我们研究了序列分歧的影响。我们发现,即使 dS 值小到 0.5,同义替换率也会被低估,而分支位点检验的假阳性率会略有增加。当 dS 进一步增加时,dS 的低估情况会更糟,但假阳性率会降低。有趣的是,真正阳性的检测结果也呈现出类似的分布,在 dS 的中间值处达到最大值。因此,高 dS 更可能导致检测的假阴性(漏报),而不是假阳性。其次,我们研究了 GC 含量的影响。我们表明,高 GC(高达约 80%)和低 GC(约 30%)基因之间的假阳性率没有显著差异。此外,特定分支上的 GC 含量变化或基因序列中 GC 含量的主要变化都不会产生大量的假阳性。我们的结果证实了分支位点是一个非常保守的检验。