Li Zhi, He Jun, Jiang Jun, G Tait Richard, Bauck Stewart, Guo Wei, Wu Xiao Lin
CollegeofAnimalScienceand Technology, HunanAgricultural University, Changsha 410128, China.
Department of Animal Science, University of Wyoming, Laramie WY 82071, USA.
Yi Chuan. 2019 Jul 20;41(7):644-652. doi: 10.16288/j.yczz.18-319.
Single nucleotide polymorphism (SNP) chips have been widely used in genetic studies and breeding applications in animal and plant species. The quality of SNP genotypes is of paramount importance. More often than not, there are situations in which a number of genotypes may fail, requiring them to be imputed. There are also situations in which ungenotyped loci need to be imputed between different chips, or high-density genotypes need to be imputed based on low-density genotypes. Under these circumstances, the validity and reliability of subsequent data analyses is subject to the accuracy of these imputed genotypes. For justifying a better understanding of factors affecting imputation accuracy, in the present study, the impacts of SNP genotyping call rate and SNP genotyping error rate on the accuracy of genotype imputation were investigated under two scenarios in 20 116 U.S. Holstein cattle, each genotyped with a GGP 50K SNP chip. When the two factors were not correlated in scenario 1, simulated genotyping call rate varied from 50% to 100% and simulated genotyping error rate changed from 0% to 50%, with both factors being independent of each other. In scenario 2, genotyping error rates were correlated with genotyping call rate, and the relationship was set up by fitting a linear regression model between the two variables on a real dataset. That is, the simulated SNP call rate varied from 100% to 50% whereas the SNP genotyping rate changed from 0% to 13.55%. Finally, a 5-fold cross-validation was used to assess the subsequent imputation accuracy. The results showed that when original SNP genotyping call rate were independent of SNP genotyping error rate, the imputation accuracy did not change significantly with the original genotyping call rate (P>0.05), but it decreased significantly as the genotyping error rate increased (P<0.01). However, when original genotyping call rate was negatively correlated with genotyping error rate, the imputation error increased with elevated original genotyping error rate. In both scenarios, genotyping call rate needs to be no less than 0.90 in order to obtain 98% or higher genotype imputation accuracy. The present results can provide guidance for establishing quality assurance criteria for SNP genotyping in practice.
单核苷酸多态性(SNP)芯片已广泛应用于动植物物种的遗传研究和育种应用中。SNP基因型的质量至关重要。通常情况下,会出现一些基因型失败的情况,需要对其进行填补。也存在不同芯片之间需要填补未分型位点,或者基于低密度基因型填补高密度基因型的情况。在这些情况下,后续数据分析的有效性和可靠性取决于这些填补基因型的准确性。为了更好地理解影响填补准确性的因素,在本研究中,在美国20116头荷斯坦奶牛中,使用GGP 50K SNP芯片对每头牛进行基因分型,在两种情况下研究了SNP基因分型检出率和SNP基因分型错误率对基因型填补准确性的影响。在第一种情况下,当这两个因素不相关时,模拟的基因分型检出率从50%变化到100%,模拟的基因分型错误率从0%变化到50%,且两个因素相互独立。在第二种情况下,基因分型错误率与基因分型检出率相关,通过在真实数据集上对两个变量拟合线性回归模型来建立这种关系。也就是说,模拟的SNP检出率从100%变化到50%,而SNP基因分型率从0%变化到13.55%。最后,使用5折交叉验证来评估后续的填补准确性。结果表明,当原始SNP基因分型检出率与SNP基因分型错误率无关时,填补准确性随原始基因分型检出率没有显著变化(P>0.05),但随着基因分型错误率的增加而显著降低(P<0.01)。然而,当原始基因分型检出率与基因分型错误率呈负相关时,填补错误随着原始基因分型错误率的升高而增加。在两种情况下,基因分型检出率都需要不低于0.90才能获得98%或更高的基因型填补准确性。本研究结果可为在实际中建立SNP基因分型质量保证标准提供指导。