Columbia River Inter-Tribal Fish Commission, Hagerman Fish Culture Experiment Station, ID 83332, USA.
Mol Ecol Resour. 2011 Mar;11 Suppl 1:184-94. doi: 10.1111/j.1755-0998.2011.02987.x. Epub 2011 Feb 6.
Genome scans with many genetic markers provide the opportunity to investigate local adaptation in natural populations and identify candidate genes under selection. In particular, SNPs are dense throughout the genome of most organisms and are commonly observed in functional genes making them ideal markers to study adaptive molecular variation. This approach has become commonly employed in ecological and population genetics studies to detect outlier loci that are putatively under selection. However, there are several challenges to address with outlier approaches including genotyping errors, underlying population structure and false positives, variation in mutation rate and limited sensitivity (false negatives). In this study, we evaluated multiple outlier tests and their type I (false positive) and type II (false negative) error rates in a series of simulated data sets. Comparisons included simulation procedures (FDIST2, ARLEQUIN v.3.5 and BAYESCAN) as well as more conventional tools such as global F(ST) histograms. Of the three simulation methods, FDIST2 and BAYESCAN typically had the lowest type II error, BAYESCAN had the least type I error and Arlequin had highest type I and II error. High error rates in Arlequin with a hierarchical approach were partially because of confounding scenarios where patterns of adaptive variation were contrary to neutral structure; however, Arlequin consistently had highest type I and type II error in all four simulation scenarios tested in this study. Given the results provided here, it is important that outlier loci are interpreted cautiously and error rates of various methods are taken into consideration in studies of adaptive molecular variation, especially when hierarchical structure is included.
基因组扫描与许多遗传标记为研究自然种群中的局部适应和鉴定受选择影响的候选基因提供了机会。特别是,SNP 在大多数生物的基因组中分布密集,且常见于功能基因中,这使得它们成为研究适应性分子变异的理想标记。这种方法已在生态和种群遗传学研究中广泛应用,以检测被认为受选择影响的异常基因座。然而,异常值方法存在几个需要解决的挑战,包括基因分型错误、潜在的群体结构和假阳性、突变率的变化以及有限的灵敏度(假阴性)。在本研究中,我们评估了多种异常值检验及其在一系列模拟数据集上的 I 型(假阳性)和 II 型(假阴性)错误率。比较包括模拟程序(FDIST2、ARLEQUIN v.3.5 和 BAYESCAN)以及更传统的工具,如全局 F(ST)直方图。在这三种模拟方法中,FDIST2 和 BAYESCAN 通常具有最低的 II 型错误,BAYESCAN 具有最低的 I 型错误,而 Arlequin 具有最高的 I 型和 II 型错误。Arlequin 分层方法的高错误率部分是由于适应性变异模式与中性结构相反的混杂情况,但在本研究测试的所有四个模拟场景中,Arlequin 始终具有最高的 I 型和 II 型错误。鉴于这里提供的结果,在研究适应性分子变异时,需要谨慎解释异常基因座,并考虑各种方法的错误率,特别是在包含分层结构的情况下。