CRV BV, P,O, Box 454, 6800 AL Arnhem, The Netherlands.
Genet Sel Evol. 2014 Feb 4;46(1):10. doi: 10.1186/1297-9686-46-10.
Imputation of genotypes from low-density to higher density chips is a cost-effective method to obtain high-density genotypes for many animals, based on genotypes of only a relatively small subset of animals (reference population) on the high-density chip. Several factors influence the accuracy of imputation and our objective was to investigate the effects of the size of the reference population used for imputation and of the imputation method used and its parameters. Imputation of genotypes was carried out from 50,000 (moderate-density) to 777,000 (high-density) SNPs (single nucleotide polymorphisms).
The effect of reference population size was studied in two datasets: one with 548 and one with 1289 Holstein animals, genotyped with the Illumina BovineHD chip (777 k SNPs). A third dataset included the 548 animals genotyped with the 777 k SNP chip and 2200 animals genotyped with the Illumina BovineSNP50 chip. In each dataset, 60 animals were chosen as validation animals, for which all high-density genotypes were masked, except for the Illumina BovineSNP50 markers. Imputation was studied in a subset of six chromosomes, using the imputation software programs Beagle and DAGPHASE.
Imputation with DAGPHASE and Beagle resulted in 1.91% and 0.87% allelic imputation error rates in the dataset with 548 high-density genotypes, when scale and shift parameters were 2.0 and 0.1, and 1.0 and 0.0, respectively. When Beagle was used alone, the imputation error rate was 0.67%. If the information obtained by Beagle was subsequently used in DAGPHASE, imputation error rates were slightly higher (0.71%). When 2200 moderate-density genotypes were added and Beagle was used alone, imputation error rates were slightly lower (0.64%). The least imputation errors were obtained with Beagle in the reference set with 1289 high-density genotypes (0.41%).
For imputation of genotypes from the 50 k to the 777 k SNP chip, Beagle gave the lowest allelic imputation error rates. Imputation error rates decreased with increasing size of the reference population. For applications for which computing time is limiting, DAGPHASE using information from Beagle can be considered as an alternative, since it reduces computation time and increases imputation error rates only slightly.
基于高密度芯片上仅一小部分动物(参考群体)的基因型,将基因型从低密度芯片估算到更高密度芯片是一种经济有效的方法,可用于许多动物。有几个因素会影响估算的准确性,我们的目标是研究用于估算的参考群体大小以及估算方法及其参数的影响。基因型的估算从 50,000(中等密度)到 777,000(高密度)SNP(单核苷酸多态性)进行。
在两个数据集(一个包含 548 头荷斯坦牛,另一个包含 1289 头荷斯坦牛)中研究了参考群体大小的影响,这些动物均使用 Illumina BovineHD 芯片(777 k SNPs)进行了基因分型。第三个数据集包括 548 头用 777 k SNP 芯片和 2200 头用 Illumina BovineSNP50 芯片进行基因分型的动物。在每个数据集中,选择 60 头动物作为验证动物,除了 Illumina BovineSNP50 标记外,这些动物的所有高密度基因型都被屏蔽。使用 Beagle 和 DAGPHASE 估算软件程序在六个染色体的子集上研究了估算。
当尺度和偏移参数分别为 2.0 和 0.1,以及 1.0 和 0.0 时,使用 DAGPHASE 和 Beagle 在具有 548 个高密度基因型的数据集的结果中,等位基因估算错误率分别为 1.91%和 0.87%。当单独使用 Beagle 时,估算错误率为 0.67%。如果随后在 DAGPHASE 中使用 Beagle 获得的信息,则估算错误率略高(0.71%)。当添加 2200 个中等密度基因型并且单独使用 Beagle 时,估算错误率略低(0.64%)。在具有 1289 个高密度基因型的参考组中,使用 Beagle 获得的估算错误率最低(0.41%)。
对于从 50 k 到 777 k SNP 芯片的基因型估算,Beagle 给出了最低的等位基因估算错误率。随着参考群体大小的增加,估算错误率降低。对于计算时间有限的应用程序,可以考虑使用 Beagle 信息的 DAGPHASE 作为替代方法,因为它减少了计算时间,并且仅略微增加了估算错误率。