Thornton Kevin R, Jensen Jeffrey D
Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York 14853, USA.
Genetics. 2007 Feb;175(2):737-50. doi: 10.1534/genetics.106.064642. Epub 2006 Nov 16.
Rapid typing of genetic variation at many regions of the genome is an efficient way to survey variability in natural populations in an effort to identify segments of the genome that have experienced recent natural selection. Following such a genome scan, individual regions may be chosen for further sequencing and a more detailed analysis of patterns of variability, often to perform a parametric test for selection and to estimate the strength of a recent selective sweep. We show here that not accounting for the ascertainment of loci in such analyses leads to false inference of natural selection when the true model is selective neutrality, because the procedure of choosing unusual loci (in comparison to the rest of the genome-scan data) selects regions of the genome with genealogies similar to those expected under models of recent directional selection. We describe a simple and efficient correction for this ascertainment bias, which restores the false-positive rate to near-nominal levels. For the parameters considered here, we find that obtaining a test with the expected distribution of P-values depends on accurately accounting both for ascertainment of regions and for demography. Finally, we use simulations to explore the utility of relying on outlier loci to detect recent selective sweeps. We find that measures of diversity and of population differentiation are more effective than summaries of the site-frequency spectrum and that sequencing larger regions (2.5 kbp) in genome-scan studies leads to more power to detect recent selective sweeps.
对基因组多个区域的遗传变异进行快速分型,是一种有效调查自然种群变异性的方法,旨在识别经历了近期自然选择的基因组片段。在进行这样的全基因组扫描后,可以选择个别区域进行进一步测序,并对变异模式进行更详细的分析,通常是为了进行选择的参数检验,并估计近期选择性清除的强度。我们在此表明,在这种分析中不考虑位点的确定,当真实模型为选择中性时,会导致对自然选择的错误推断,因为选择异常位点(与全基因组扫描数据的其余部分相比)的过程会选择基因组中系谱与近期定向选择模型下预期相似的区域。我们描述了一种针对这种确定偏差的简单有效校正方法,可将假阳性率恢复到接近名义水平。对于此处考虑的参数,我们发现要获得具有预期P值分布的检验,取决于准确考虑区域的确定和种群统计学。最后,我们使用模拟来探索依靠异常位点检测近期选择性清除的效用。我们发现,多样性和种群分化的度量比位点频率谱的汇总更有效,并且在全基因组扫描研究中对更大区域(2.5千碱基对)进行测序会带来更强的检测近期选择性清除的能力。