Department of Animal Sciences, Georg-August-University Göttingen, 37077, Göttingen, Germany.
Center for Integrated Breeding Research (CiBreed), Georg-August-University Göttingen, 37075, Göttingen, Germany.
Genet Sel Evol. 2022 Jul 4;54(1):49. doi: 10.1186/s12711-022-00740-8.
Genotype imputation is a cost-effective method to generate sequence-level genotypes for a large number of animals. Its application can improve the power of genomic studies, provided that the accuracy of imputation is sufficiently high. The purpose of this study was to develop an optimal strategy for genotype imputation from genotyping array data to sequence level in German warmblood horses, and to investigate the effect of different factors on the accuracy of imputation. Publicly available whole-genome sequence data from 317 horses of 46 breeds was used to conduct the analyses.
Depending on the size and composition of the reference panel, the accuracy of imputation from medium marker density (60K) to sequence level using the software Beagle 5.1 ranged from 0.64 to 0.70 for horse chromosome 3. Generally, imputation accuracy increased as the size of the reference panel increased, but if genetically distant individuals were included in the panel, the accuracy dropped. Imputation was most precise when using a reference panel of multiple but related breeds and the software Beagle 5.1, which outperformed the other two tested computer programs, Impute 5 and Minimac 4. Genome-wide imputation for this scenario resulted in a mean accuracy of 0.66. Stepwise imputation from 60K to 670K markers and subsequently to sequence level did not improve the accuracy of imputation. However, imputation from higher density (670K) was considerably more accurate (about 0.90) than from medium density. Likewise, imputation in genomic regions with a low marker coverage resulted in a reduced accuracy of imputation.
The accuracy of imputation in horses was influenced by the size and composition of the reference panel, the marker density of the genotyping array, and the imputation software. Genotype imputation can be used to extend the limited amount of available sequence-level data from horses in order to boost the power of downstream analyses, such as genome-wide association studies, or the detection of embryonic lethal variants.
基因分型是一种经济有效的方法,可以为大量动物生成序列水平的基因型。只要基因分型的准确性足够高,它的应用就可以提高基因组研究的效力。本研究的目的是为德国温血马建立一种从基因分型阵列数据到序列水平的最佳基因分型策略,并研究不同因素对基因分型准确性的影响。使用来自 46 个品种的 317 匹马的公开全基因组序列数据进行了分析。
根据参考面板的大小和组成,使用软件 Beagle 5.1 将中等标记密度(60K)的基因分型数据转换为序列水平的准确性在马染色体 3 上的范围为 0.64 至 0.70。一般来说,随着参考面板的增大,基因分型的准确性也会提高,但如果在面板中包含遗传上较远的个体,准确性就会下降。使用多但相关品种的参考面板和软件 Beagle 5.1 进行基因分型最为精确,其表现优于另外两个测试的计算机程序 Impute 5 和 Minimac 4。在这种情况下,全基因组基因分型的平均准确性为 0.66。从 60K 到 670K 标记,然后再到序列水平的逐步基因分型并不能提高基因分型的准确性。然而,来自更高密度(670K)的基因分型准确性要高得多(约 0.90)。同样,在标记覆盖率较低的基因组区域进行基因分型会降低基因分型的准确性。
马的基因分型准确性受到参考面板的大小和组成、基因分型阵列的标记密度以及基因分型软件的影响。基因分型可以用来扩展马中可用的有限数量的序列水平数据,以增强下游分析的效力,如全基因组关联研究或胚胎致死变异的检测。