Keightley Peter D, Campos José L, Booker Tom R, Charlesworth Brian
Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, EH9 3FL, United Kingdom
Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, EH9 3FL, United Kingdom.
Genetics. 2016 Jun;203(2):975-84. doi: 10.1534/genetics.116.188102. Epub 2016 Apr 20.
Many approaches for inferring adaptive molecular evolution analyze the unfolded site frequency spectrum (SFS), a vector of counts of sites with different numbers of copies of derived alleles in a sample of alleles from a population. Accurate inference of the high-copy-number elements of the SFS is difficult, however, because of misassignment of alleles as derived vs. ancestral. This is a known problem with parsimony using outgroup species. Here we show that the problem is particularly serious if there is variation in the substitution rate among sites brought about by variation in selective constraint levels. We present a new method for inferring the SFS using one or two outgroups that attempts to overcome the problem of misassignment. We show that two outgroups are required for accurate estimation of the SFS if there is substantial variation in selective constraints, which is expected to be the case for nonsynonymous sites in protein-coding genes. We apply the method to estimate unfolded SFSs for synonymous and nonsynonymous sites in a population of Drosophila melanogaster from phase 2 of the Drosophila Population Genomics Project. We use the unfolded spectra to estimate the frequency and strength of advantageous and deleterious mutations and estimate that ∼50% of amino acid substitutions are positively selected but that <0.5% of new amino acid mutations are beneficial, with a scaled selection strength of Nes ≈ 12.
许多推断适应性分子进化的方法都分析展开的位点频率谱(SFS),它是在一个种群的等位基因样本中,具有不同数量衍生等位基因拷贝的位点计数向量。然而,由于等位基因被错误地指定为衍生型与祖先型,准确推断SFS的高拷贝数元素很困难。这是使用外群物种进行简约法时的一个已知问题。在这里我们表明,如果由于选择约束水平的变化导致位点间的替换率存在差异,这个问题会特别严重。我们提出了一种使用一或两个外群来推断SFS的新方法,试图克服错误指定的问题。我们表明,如果选择约束存在实质性差异,准确估计SFS需要两个外群,蛋白质编码基因中的非同义位点预计就是这种情况。我们应用该方法来估计果蝇种群基因组计划第2阶段中黑腹果蝇种群同义位点和非同义位点的展开SFS。我们使用展开的谱来估计有利和有害突变的频率和强度,并估计约50%的氨基酸替换是正选择的,但<0.5%的新氨基酸突变是有益的,标度选择强度Nes≈12。