Friedenberg S G, Meurs K M
Department of Clinical Sciences and Comparative Medicine Institute, North Carolina State University College of Veterinary Medicine, 1060 William Moore Drive, Raleigh, NC, 27607, USA.
Mamm Genome. 2016 Oct;27(9-10):485-94. doi: 10.1007/s00335-016-9636-9. Epub 2016 Apr 29.
Application of imputation methods to accurately predict a dense array of SNP genotypes in the dog could provide an important supplement to current analyses of array-based genotyping data. Here, we developed a reference panel of 4,885,283 SNPs in 83 dogs across 15 breeds using whole genome sequencing. We used this panel to predict the genotypes of 268 dogs across three breeds with 84,193 SNP array-derived genotypes as inputs. We then (1) performed breed clustering of the actual and imputed data; (2) evaluated several reference panel breed combinations to determine an optimal reference panel composition; and (3) compared the accuracy of two commonly used software algorithms (Beagle and IMPUTE2). Breed clustering was well preserved in the imputation process across eigenvalues representing 75 % of the variation in the imputed data. Using Beagle with a target panel from a single breed, genotype concordance was highest using a multi-breed reference panel (92.4 %) compared to a breed-specific reference panel (87.0 %) or a reference panel containing no breeds overlapping with the target panel (74.9 %). This finding was confirmed using target panels derived from two other breeds. Additionally, using the multi-breed reference panel, genotype concordance was slightly higher with IMPUTE2 (94.1 %) compared to Beagle; Pearson correlation coefficients were slightly higher for both software packages (0.946 for Beagle, 0.961 for IMPUTE2). Our findings demonstrate that genotype imputation from SNP array-derived data to whole genome-level genotypes is both feasible and accurate in the dog with appropriate breed overlap between the target and reference panels.
应用插补方法准确预测犬类中密集的单核苷酸多态性(SNP)基因型阵列,可为当前基于阵列的基因分型数据分析提供重要补充。在此,我们利用全基因组测序技术,在15个品种的83只犬中开发了一个包含4,885,283个SNP的参考面板。我们使用该面板,以84,193个来自SNP阵列的基因型作为输入,预测了三个品种的268只犬的基因型。然后,我们(1)对实际数据和插补数据进行品种聚类;(2)评估几种参考面板品种组合,以确定最佳参考面板组成;(3)比较两种常用软件算法(Beagle和IMPUTE2)的准确性。在代表插补数据中75%变异的特征值范围内,品种聚类在插补过程中得到了很好的保留。使用Beagle软件,对于来自单一品种的目标面板,与特定品种参考面板(87.0%)或与目标面板无品种重叠的参考面板(74.9%)相比,使用多品种参考面板时基因型一致性最高(92.4%)。使用来自其他两个品种的目标面板也证实了这一发现。此外,对于多品种参考面板,使用IMPUTE2时的基因型一致性(94.1%)略高于Beagle;两个软件包的皮尔逊相关系数也略高(Beagle为0.946,IMPUTE2为0.961)。我们的研究结果表明,在目标面板和参考面板之间有适当品种重叠的情况下,从SNP阵列衍生数据到全基因组水平基因型的基因型插补在犬类中既可行又准确。