Judge M M, Purfield D C, Sleator R D, Berry D P
J Anim Sci. 2017 Apr;95(4):1489-1501. doi: 10.2527/jas.2016.1212.
The objective of the present study was to quantify, using simulations, the impact of successive generations of genotype imputation on genomic predictions. The impact of using a small reference population of true genotypes versus a larger reference population of imputed genotypes on the accuracy of genomic predictions was also investigated. After construction of a founder population, high-density (HD) genotypes ( = 43,500 single nucleotide polymorphisms, SNP) were simulated across 25 generations ( = 46,800 per generation); a low-density genotype panel ( = 3,000 SNP) was developed from these HD genotypes, which was then used to impute genotypes using 7 alternative imputation strategies. Both low (0.03) and moderately (0.35) heritable phenotypes were simulated. Direct genomic values (DGV) were estimated using imputed genotypes from the investigated scenarios and the accuracy of predicting the simulated true breeding values (TBV) were expressed relative to the accuracy when the true genotypes were used. Mean allele concordance rate and the rate of change in mean allele concordance per generation differed between the imputation strategies investigated. Imputation was most accurate when the true HD genotypes of sires and 50% of the dams of the generation being imputed were included in the reference population; the average allele concordance rate for this scenario across generations was 0.9707. The strongest correlation between the TBV and DGV of the last generation was when the reference population included sequentially imputed HD genotypes of all previous generations, plus the true HD genotypes of all sires of the previous generations (0.987 as efficient as when the true genotypes were used in the reference population). With a moderate heritability, the correlation between the TBV and the DGV using a small reference population of accurate genotypes were, on average, 0.07 units stronger compared to DGV generated using a larger population of imputed genotypes. When the heritability was low, the accuracy of genomic predictions benefited from a larger reference population, even if SNP were imputed. The impact on the accuracy of genomic predictions from the accumulation of imputation errors across generations indicates the need to routinely generate HD genotypes on influential animals to reduce the accumulation of imputation errors over generations.
本研究的目的是通过模拟来量化连续几代基因型填充对基因组预测的影响。还研究了使用真实基因型的小参考群体与填充基因型的大参考群体对基因组预测准确性的影响。构建奠基群体后,在25代中模拟高密度(HD)基因型( = 43,500个单核苷酸多态性,SNP);从这些HD基因型中开发出低密度基因型面板( = 3,000个SNP),然后使用7种替代填充策略来填充基因型。模拟了低遗传力(0.03)和中等遗传力(0.35)的表型。使用所研究场景中的填充基因型估计直接基因组值(DGV),并将预测模拟真实育种值(TBV)的准确性相对于使用真实基因型时的准确性来表示。在所研究的填充策略之间,平均等位基因一致性率和每代平均等位基因一致性的变化率有所不同。当参考群体中包含被填充世代的父本的真实HD基因型和50%的母本的真实HD基因型时,填充最为准确;该场景下各代的平均等位基因一致性率为0.9707。当参考群体依次包含所有前代的填充HD基因型以及前代所有父本的真实HD基因型时,最后一代的TBV与DGV之间的相关性最强(与参考群体中使用真实基因型时的效率相比为0.987)。在中等遗传力情况下,与使用较大的填充基因型群体生成的DGV相比,使用少量准确基因型参考群体时,TBV与DGV之间的相关性平均强0.07个单位。当遗传力较低时,即使SNP是填充的,基因组预测的准确性也受益于更大的参考群体。几代间填充误差积累对基因组预测准确性的影响表明,需要定期对有影响力的动物生成HD基因型,以减少几代间填充误差的积累。