Biomedical Informatics Research Division, eHealth Research and Innovation Platform, Medical Research Council, Tygerberg, South Africa.
PLoS Genet. 2012;8(7):e1002764. doi: 10.1371/journal.pgen.1002764. Epub 2012 Jul 12.
The imprint of natural selection on protein coding genes is often difficult to identify because selection is frequently transient or episodic, i.e. it affects only a subset of lineages. Existing computational techniques, which are designed to identify sites subject to pervasive selection, may fail to recognize sites where selection is episodic: a large proportion of positively selected sites. We present a mixed effects model of evolution (MEME) that is capable of identifying instances of both episodic and pervasive positive selection at the level of an individual site. Using empirical and simulated data, we demonstrate the superior performance of MEME over older models under a broad range of scenarios. We find that episodic selection is widespread and conclude that the number of sites experiencing positive selection may have been vastly underestimated.
自然选择对蛋白质编码基因的影响往往难以识别,因为选择通常是短暂的或间歇性的,即只影响部分谱系。现有的计算技术旨在识别受到普遍选择影响的位点,可能无法识别选择是间歇性的位点:即大量的正选择位点。我们提出了一种能够在单个位点水平上识别间歇性和普遍正选择实例的混合效应模型(MEME)。使用经验数据和模拟数据,我们在广泛的场景下证明了 MEME 优于旧模型的性能。我们发现间歇性选择很普遍,并得出结论,经历正选择的位点数量可能被大大低估。