Department of Animal Sciences, Animal Breeding and Genetics Group, University of Goettingen, Göttingen, Germany.
Center for Integrated Breeding Research, University of Goettingen, Göttingen, Germany.
PLoS One. 2021 Mar 30;16(3):e0245178. doi: 10.1371/journal.pone.0245178. eCollection 2021.
Single nucleotide polymorphisms (SNPs), genotyped with arrays, have become a widely used marker type in population genetic analyses over the last 10 years. However, compared to whole genome re-sequencing data, arrays are known to lack a substantial proportion of globally rare variants and tend to be biased towards variants present in populations involved in the development process of the respective array. This affects population genetic estimators and is known as SNP ascertainment bias. We investigated factors contributing to ascertainment bias in array development by redesigning the Axiom™ Genome-Wide Chicken Array in silico and evaluating changes in allele frequency spectra and heterozygosity estimates in a stepwise manner. A sequential reduction of rare alleles during the development process was shown. This was mainly caused by the identification of SNPs in a limited set of populations and a within-population selection of common SNPs when aiming for equidistant spacing. These effects were shown to be less severe with a larger discovery panel. Additionally, a generally massive overestimation of expected heterozygosity for the ascertained SNP sets was shown. This overestimation was 24% higher for populations involved in the discovery process than not involved populations in case of the original array. The same was observed after the SNP discovery step in the redesign. However, an unequal contribution of populations during the SNP selection can mask this effect but also adds uncertainty. Finally, we make suggestions for the design of specialized arrays for large scale projects where whole genome re-sequencing techniques are still too expensive.
单核苷酸多态性(SNPs),通过芯片进行基因分型,在过去 10 年中已成为群体遗传学分析中广泛使用的标记类型。然而,与全基因组重测序数据相比,芯片已知缺乏相当一部分全球罕见变异体,并且往往偏向于参与各自芯片开发过程的人群中存在的变异体。这会影响群体遗传估算值,这种现象被称为 SNP 确定偏差。我们通过重新设计 Axiom™全基因组鸡芯片,以逐步的方式评估等位基因频率谱和杂合度估计值的变化,来研究导致芯片开发中确定偏差的因素。结果显示,在开发过程中罕见等位基因逐渐减少。这主要是由于在有限的人群中确定 SNP 以及在以等距间隔为目标时对常见 SNP 进行群体内选择所致。发现面板较大时,这些影响会较小。此外,还显示出对确定的 SNP 集的预期杂合度的普遍严重高估。与未参与原始芯片发现过程的人群相比,参与发现过程的人群中的这种高估要高出 24%。在重新设计后的 SNP 发现步骤中也观察到了同样的情况。但是,在 SNP 选择过程中,群体的不平衡贡献可能会掩盖这种影响,但也会增加不确定性。最后,我们对大型项目中专门设计的芯片提出了建议,在这些项目中,全基因组重测序技术仍然过于昂贵。