Centre for Quantitative Genetics and Genomics, Department of Molecular Biology and Genetics, Aarhus University, DK 8830 Tjele, Denmark.
J Dairy Sci. 2013 Jul;96(7):4666-77. doi: 10.3168/jds.2012-6316. Epub 2013 May 16.
This study investigated the imputation accuracy of different methods, considering both the minor allele frequency and relatedness between individuals in the reference and test data sets. Two data sets from the combined population of Swedish and Finnish Red Cattle were used to test the influence of these factors on the accuracy of imputation. Data set 1 consisted of 2,931 reference bulls and 971 test bulls, and was used for validation of imputation from 3,000 markers (3K) to 54,000 markers (54K). Data set 2 contained 341 bulls in the reference set and 117 in the test set, and was used for validation of imputation from 54K to high density [777,000 markers (777K)]. Both test sets were divided into 4 groups according to their relationship to the reference population. Five imputation methods (Beagle, IMPUTE2, findhap, AlphaImpute, and FImpute) were used in this study. Imputation accuracy was measured as the allele correct rate and correlation between imputed and true genotypes. Results demonstrated that the accuracy was lower when imputing from 3K to 54K than from 54K to 777K. Using various imputation methods, the allele correct rates varied from 93.5 to 97.1% when imputing from 3K to 54K, and from 97.1 to 99.3% when imputing from 54K to 777K; IMPUTE2 and Beagle resulted in higher accuracies and were more robust under various conditions than the other 3 methods when imputing from 3K to 54K. The accuracy of imputation using FImpute was similar to those results from Beagle and IMPUTE2 when imputing from 54K to high density, and higher than the remaining 2 methods. The results also showed that a closer relationship between test set and reference set led to a higher accuracy for all the methods. In addition, the correct rate was higher when the minor allele frequency was lower, whereas the correlation coefficient was lower when the minor allele frequency was lower. The results indicate that Beagle and IMPUTE2 provide the most robust and accurate imputation accuracies, but considering computing time and memory usage, FImpute is another alternative method.
本研究考察了不同方法的插补准确性,同时考虑了参考数据集和测试数据集中个体的次要等位基因频率和相关性。使用来自瑞典和芬兰红牛的综合群体的两个数据集来测试这些因素对插补准确性的影响。数据集 1 由 2931 头参考公牛和 971 头测试公牛组成,用于验证从 3000 个标记(3K)到 54000 个标记(54K)的标记的插补。数据集 2 包含 341 头参考组公牛和 117 头测试组公牛,用于验证从 54K 到高密度[777000 个标记(777K)]的标记的插补。两个测试集均根据与参考群体的关系分为 4 组。本研究使用了 5 种插补方法(Beagle、IMPUTE2、findhap、AlphaImpute 和 FImpute)。插补准确性以等位基因正确率和插补与真实基因型之间的相关性来衡量。结果表明,从 3K 到 54K 插补的准确性低于从 54K 到 777K 的插补。使用各种插补方法,从 3K 到 54K 插补的等位基因正确率在 93.5%至 97.1%之间,从 54K 到 777K 插补的等位基因正确率在 97.1%至 99.3%之间;在从 3K 到 54K 进行插补时,与其他 3 种方法相比,IMPUTE2 和 Beagle 产生了更高的准确性,并且在各种条件下更稳健。从 54K 到高密度进行插补时,FImpute 的插补准确性与 Beagle 和 IMPUTE2 的结果相似,高于其余 2 种方法。结果还表明,测试集与参考集之间的关系越密切,所有方法的准确性越高。此外,当次要等位基因频率较低时,正确率较高,而当次要等位基因频率较低时,相关系数较低。结果表明,Beagle 和 IMPUTE2 提供了最稳健和准确的插补准确性,但考虑到计算时间和内存使用情况,FImpute 是另一种替代方法。