Vatsiou Alexandra I, Bazin Eric, Gaggiotti Oscar E
Laboratoire d'Ecologie Alpine, UMR CNRS 5553, Université Joseph Fourier, Grenoble, France.
Scottish Oceans Institute, East Sands, University of St Andrews, St Andrews, KY16 8LB, UK.
Mol Ecol. 2016 Jan;25(1):89-103. doi: 10.1111/mec.13360. Epub 2015 Oct 12.
Identifying genomic regions targeted by positive selection has been a long-standing interest of evolutionary biologists. This objective was difficult to achieve until the recent emergence of next-generation sequencing, which is fostering the development of large-scale catalogues of genetic variation for increasing number of species. Several statistical methods have been recently developed to analyse these rich data sets, but there is still a poor understanding of the conditions under which these methods produce reliable results. This study aims at filling this gap by assessing the performance of genome-scan methods that consider explicitly the physical linkage among SNPs surrounding a selected variant. Our study compares the performance of seven recent methods for the detection of selective sweeps (iHS, nSL, EHHST, xp-EHH, XP-EHHST, XPCLR and hapFLK). We use an individual-based simulation approach to investigate the power and accuracy of these methods under a wide range of population models under both hard and soft sweeps. Our results indicate that XPCLR and hapFLK perform best and can detect soft sweeps under simple population structure scenarios if migration rate is low. All methods perform poorly with moderate-to-high migration rates, or with weak selection and very poorly under a hierarchical population structure. Finally, no single method is able to detect both starting and nearly completed selective sweeps. However, combining several methods (XPCLR or hapFLK with iHS or nSL) can greatly increase the power to pinpoint the selected region.
识别受到正选择作用的基因组区域一直是进化生物学家长期以来关注的问题。在新一代测序技术出现之前,这一目标很难实现,而新一代测序技术正在推动为越来越多的物种构建大规模遗传变异目录。最近已经开发了几种统计方法来分析这些丰富的数据集,但对于这些方法在何种条件下能产生可靠结果仍知之甚少。本研究旨在通过评估明确考虑所选变异周围单核苷酸多态性(SNP)之间物理连锁的基因组扫描方法的性能来填补这一空白。我们的研究比较了七种最近用于检测选择性清除的方法(iHS、nSL、EHHST、xp - EHH、XP - EHHST、XPCLR和hapFLK)的性能。我们使用基于个体的模拟方法来研究这些方法在硬选择和软选择下的广泛种群模型中的功效和准确性。我们的结果表明,XPCLR和hapFLK表现最佳,在迁移率较低的简单种群结构情景下能够检测到软选择。在中等至高迁移率、弱选择情况下,所有方法的表现都很差,在分层种群结构下表现更差。最后,没有一种方法能够同时检测到起始和接近完成的选择性清除。然而,将几种方法(XPCLR或hapFLK与iHS或nSL结合)可以大大提高确定所选区域的能力。