University of Grenoble-Alpes, University of Savoy Mont Blanc, CNRS, LECA, Grenoble, France.
National Institute of Agronomic Research (INRA Maroc), Regional Centre of Agronomic Research, Beni-Mellal, Morocco.
Mol Ecol Resour. 2019 Nov;19(6):1497-1515. doi: 10.1111/1755-0998.13070. Epub 2019 Sep 9.
Whole genome sequences (WGS) greatly increase our ability to precisely infer population genetic parameters, demographic processes, and selection signatures. However, WGS may still be not affordable for a representative number of individuals/populations. In this context, our goal was to assess the efficiency of several SNP genotyping strategies by testing their ability to accurately estimate parameters describing neutral diversity and to detect signatures of selection. We analysed 110 WGS at 12× coverage for four different species, i.e., sheep, goats and their wild counterparts. From these data we generated 946 data sets corresponding to random panels of 1K to 5M variants, commercial SNP chips and exome capture, for sample sizes of five to 48 individuals. We also extracted low-coverage genome resequencing of 1×, 2× and 5× by randomly subsampling reads from the 12× resequencing data. Globally, 5K to 10K random variants were enough for an accurate estimation of genome diversity. Conversely, commercial panels and exome capture displayed strong ascertainment biases. Besides the characterization of neutral diversity, the detection of the signature of selection and the accurate estimation of linkage disequilibrium (LD) required high-density panels of at least 1M variants. Finally, genotype likelihoods increased the quality of variant calling from low coverage resequencing but proportions of incorrect genotypes remained substantial, especially for heterozygote sites. Whole genome resequencing coverage of at least 5× appeared to be necessary for accurate assessment of genomic variations. These results have implications for studies seeking to deploy low-density SNP collections or genome scans across genetically diverse populations/species showing similar genetic characteristics and patterns of LD decay for a wide variety of purposes.
全基因组序列(WGS)极大地提高了我们精确推断群体遗传参数、人口统计过程和选择特征的能力。然而,对于大量个体/群体来说,WGS 可能仍然负担不起。在这种情况下,我们的目标是通过测试其准确估计描述中性多样性的参数和检测选择特征的能力来评估几种 SNP 基因分型策略的效率。我们分析了四个不同物种(绵羊、山羊及其野生对应物)的 110 个 12×覆盖的 WGS。从这些数据中,我们生成了 946 个数据集,对应于从 1K 到 5M 变体的随机面板、商业 SNP 芯片和外显子捕获,样本量为 5 到 48 个个体。我们还通过从 12×重测序数据中随机抽样读取,提取了低覆盖度的基因组重测序,覆盖度为 1×、2×和 5×。总体而言,5K 到 10K 个随机变体足以准确估计基因组多样性。相反,商业面板和外显子捕获显示出强烈的确定偏差。除了中性多样性的特征外,选择特征的检测和连锁不平衡(LD)的准确估计需要至少 1M 变体的高密度面板。最后,基因型似然度提高了来自低覆盖度重测序的变异调用质量,但错误基因型的比例仍然很大,尤其是对于杂合子位点。至少 5×的全基因组重测序覆盖度似乎是准确评估基因组变异所必需的。这些结果对于那些寻求在遗传多样性较大的群体/物种中部署低密度 SNP 集合或基因组扫描以实现各种目的的研究具有重要意义,这些群体/物种具有相似的遗传特征和 LD 衰减模式。