Animal and Grassland Research and Innovation Centre, Teagasc, Moorepark, Fermoy, Co. Cork, Ireland.
Laboratory of Animal Reproduction, Department of Biological Sciences, Faculty of Science and Engineering, University of Limerick, Limerick, Ireland.
J Anim Sci. 2019 Apr 3;97(4):1550-1567. doi: 10.1093/jas/skz043.
The objective of the present study was to quantify the accuracy of imputing medium-density single nucleotide polymorphism (SNP) genotypes from lower-density panels (384 to 12,000 SNPs) derived using alternative selection methods to select the most informative SNPs. Four different selection methods were used to select SNPs based on genomic characteristics (i.e., minor allele frequency (MAF) and linkage disequilibrium (LD)) within five sheep breeds (642 Belclare, 645 Charollais, 715 Suffolk, 440 Texel, and 620 Vendeen) separately. Selection methods evaluated included (i) random, (ii) splitting the genome into blocks of equal length and selecting SNPs within block based on MAF and LD patterns, (iii) equidistant location while optimizing MAF, (iv) a combination of MAF, distance from already selected SNPs, and weak LD with the SNP(s) already selected. All animals were genotyped on the Illumina OvineSNP50 Beadchip containing 51,135 SNPs of which 44,040 remained after edits. Within each breed separately, the youngest 100 animals were assumed to represent the validation population; the remaining animals represented the reference population. Imputation was undertaken under three different conditions: (i) SNPs were selected within a given breed and imputed for all breeds individually, (ii) all breeds were collectively used to select SNPs and were included as the reference population, and (iii) the SNPs were selected for each breed separately and imputation was undertaken for all breeds but excluding from the reference population, the breed from which the SNPs were selected. Regardless of SNP selection method, mean animal allele concordance rate improved at a diminishing rate while the variability in mean animal allele concordance rate reduced as the panel density increased. The SNP selection method impacted the accuracy of imputation although the effect reduced as the density of the panel increased. Overall, the most accurate SNP selection method for panels with <9,000 SNPs was that based on MAF and LD pattern within genomic blocks. The mean animal allele concordance rate varied from 0.89 in Texel to 0.97 in Vendeen. Greater imputation accuracy was achieved when SNPs were selected and imputed within each breed individually compared with when SNPs were selected across all breeds and imputed using a multi-breed reference population. In all, results indicate that accurate genotype imputation to medium density is achievable with low-density genotype panels with at least 6,000 SNPs.
本研究的目的是量化从使用替代选择方法获得的较低密度面板(384 至 12,000 个 SNP)中推断中密度单核苷酸多态性(SNP)基因型的准确性,这些选择方法用于选择最具信息量的 SNP。基于基因组特征(即次要等位基因频率(MAF)和连锁不平衡(LD)),在五个绵羊品种(642 只 Belclare、645 只 Charollais、715 只 Suffolk、440 只 Texel 和 620 只 Vendee)中分别使用了四种不同的选择方法来选择 SNP。评估的选择方法包括:(i)随机选择,(ii)将基因组分成等长的块,并根据 MAF 和 LD 模式选择块内的 SNP,(iii)在优化 MAF 的同时等距定位,(iv)MAF、与已选择 SNP 的距离和与已选择 SNP 的弱 LD 的组合。所有动物均在包含 51,135 个 SNP 的 Illumina OvineSNP50 Beadchip 上进行基因分型,编辑后剩余 44,040 个 SNP。在每个品种中,将最年轻的 100 只动物假定为验证群体;其余动物代表参考群体。在三种不同的条件下进行了推断:(i)在给定的品种内选择 SNP,并单独为所有品种进行推断,(ii)共同使用所有品种选择 SNP,并将其作为参考群体,(iii)为每个品种分别选择 SNP,并为所有品种进行推断,但排除了从其中选择 SNP 的品种。无论 SNP 选择方法如何,随着面板密度的增加,动物等位基因一致性的平均一致性率以递减的速度提高,而动物等位基因一致性的平均可变性降低。尽管随着面板密度的增加,效果会降低,但 SNP 选择方法确实会影响推断的准确性。总体而言,对于 <9,000 个 SNP 的面板,基于基因组块内 MAF 和 LD 模式的 SNP 选择方法最准确。动物等位基因一致性的平均比率从 Texel 的 0.89 变化到 Vendee 的 0.97。与从所有品种选择 SNP 并使用多品种参考群体进行推断相比,在每个品种中分别选择和推断 SNP 可实现更高的推断准确性。总而言之,使用至少 6,000 个 SNP 的低密度基因型面板实现中密度准确基因型推断是可行的。