Flanagan Sarah P, Jones Adam G
Biology Department, Texas A&M University, 3258 TAMU, College Station, TX 77843; and National Institute for Mathematical and Biological Synthesis, University of Tennessee, Knoxville, TN.
J Hered. 2017 Jul 1;108(5):561-573. doi: 10.1093/jhered/esx048.
The FST-heterozygosity outlier approach has been a popular method for identifying loci under balancing and positive selection since Beaumont and Nichols first proposed it in 1996 and recommended its use for studies sampling a large number of independent populations (at least 10). Since then, their program FDIST2 and a user-friendly program optimized for large datasets, LOSITAN, have been used widely in the population genetics literature, often without the requisite number of samples. We observed empirical datasets whose distributions could not be reconciled with the confidence intervals generated by the null coalescent island model. Here, we use forward-in-time simulations to investigate circumstances under which the FST-heterozygosity outlier approach performs poorly for next-generation single nucleotide polymorphism (SNP) datasets. Our results show that samples involving few independent populations, particularly when migration rates are low, result in distributions of the FST-heterozygosity relationship that are not described by the null model implemented in LOSITAN. In addition, even under favorable conditions LOSITAN rarely provides confidence intervals that precisely fit SNP data, making the associated P-values only roughly valid at best. We present an alternative method, implemented in a new R package named fsthet, which uses the raw empirical data to generate smoothed outlier plots for the FST-heterozygosity relationship.
自1996年博蒙特和尼科尔斯首次提出FST杂合度异常值方法,并建议将其用于对大量独立群体(至少10个)进行抽样的研究以来,该方法一直是识别平衡选择和正选择位点的常用方法。从那时起,他们的程序FDIST2以及为大型数据集优化的用户友好程序LOSITAN,在群体遗传学文献中被广泛使用,而这些文献往往没有达到所需的样本数量。我们观察到一些经验数据集,其分布与零合并岛模型生成的置信区间不一致。在这里,我们使用时间向前模拟来研究FST杂合度异常值方法在下一代单核苷酸多态性(SNP)数据集上表现不佳的情况。我们的结果表明,涉及少量独立群体的样本,特别是当迁移率较低时,会导致FST杂合度关系的分布无法用LOSITAN中实现的零模型来描述。此外,即使在有利条件下,LOSITAN也很少能提供精确拟合SNP数据的置信区间,这使得相关的P值充其量也只是大致有效。我们提出了一种替代方法,在一个名为fsthet的新R包中实现,该方法使用原始经验数据生成FST杂合度关系的平滑异常值图。