China Agricultural University, Beijing, China.
Animal. 2013 May;7(5):729-35. doi: 10.1017/S1751731112002224. Epub 2012 Dec 11.
Imputation of high-density genotypes from low- or medium-density platforms is a promising way to enhance the efficiency of whole-genome selection programs at low cost. In this study, we compared the efficiency of three widely used imputation algorithms (fastPHASE, BEAGLE and findhap) using Chinese Holstein cattle with Illumina BovineSNP50 genotypes. A total of 2108 cattle were randomly divided into a reference population and a test population to evaluate the influence of the reference population size. Three bovine chromosomes, BTA1, 16 and 28, were used to represent large, medium and small chromosome size, respectively. We simulated different scenarios by randomly masking 20%, 40%, 80% and 95% single-nucleotide polymorphisms (SNPs) on each chromosome in the test population to mimic different SNP density panels. Illumina Bovine3K and Illumina BovineLD (6909 SNPs) information was also used. We found that the three methods showed comparable accuracy when the proportion of masked SNPs was low. However, the difference became larger when more SNPs were masked. BEAGLE performed the best and was most robust with imputation accuracies >90% in almost all situations. fastPHASE was affected by the proportion of masked SNPs, especially when the masked SNP rate was high. findhap ran the fastest, whereas its accuracies were lower than those of BEAGLE but higher than those of fastPHASE. In addition, enlarging the reference population improved the imputation accuracy for BEAGLE and findhap, but did not affect fastPHASE. Considering imputation accuracy and computational requirements, BEAGLE has been found to be more reliable for imputing genotypes from low- to high-density genotyping platforms.
从低或中密度平台推断高密度基因型是一种很有前途的方法,可以以低成本提高全基因组选择计划的效率。本研究使用伊利诺伊牛 SNP50 基因型的中国荷斯坦牛比较了三种广泛使用的推断算法(fastPHASE、BEAGLE 和 findhap)的效率。总共 2108 头牛被随机分为参考群体和测试群体,以评估参考群体大小的影响。三个牛染色体 BTA1、16 和 28 分别代表大、中、小染色体大小。我们通过在测试群体中随机屏蔽每个染色体上 20%、40%、80%和 95%的单核苷酸多态性(SNP)来模拟不同的 SNP 密度面板,模拟了不同的情景。还使用了伊利诺伊牛 3K 和伊利诺伊牛 LD(6909 SNP)信息。我们发现,当屏蔽 SNP 的比例较低时,三种方法的准确性相当。然而,当屏蔽的 SNP 越多时,差异就越大。BEAGLE 的表现最好,在几乎所有情况下,其推断准确率都超过 90%,具有最大的稳健性。fastPHASE 受到屏蔽 SNP 比例的影响,尤其是当屏蔽 SNP 率较高时。findhap 运行速度最快,但其准确性低于 BEAGLE,但高于 fastPHASE。此外,扩大参考群体提高了 BEAGLE 和 findhap 的推断准确性,但对 fastPHASE 没有影响。考虑到推断准确性和计算要求,发现 BEAGLE 更可靠,可用于从低到高密度的基因分型平台推断基因型。