Department of Animal Science, Faculty of Agricultural and Veterinary Sciences, Sao Paulo State University (UNESP), Jaboticabal, SP, Brazil.
Department of Animal Sciences, University of Wisconsin, Madison, WI.
J Anim Sci. 2020 Jun 1;98(6). doi: 10.1093/jas/skaa179.
The aim of this study was to compare the predictive performance of the Genomic Best Linear Unbiased Predictor (GBLUP) and machine learning methods (Random Forest, RF; Support Vector Machine, SVM; Artificial Neural Network, ANN) in simulated populations presenting different levels of dominance effects. Simulated genome comprised 50k SNP and 300 QTL, both biallelic and randomly distributed across 29 autosomes. A total of six traits were simulated considering different values for the narrow and broad-sense heritability. In the purely additive scenario with low heritability (h2 = 0.10), the predictive ability obtained using GBLUP was slightly higher than the other methods whereas ANN provided the highest accuracies for scenarios with moderate heritability (h2 = 0.30). The accuracies of dominance deviations predictions varied from 0.180 to 0.350 in GBLUP extended for dominance effects (GBLUP-D), from 0.06 to 0.185 in RF and they were null using the ANN and SVM methods. Although RF has presented higher accuracies for total genetic effect predictions, the mean-squared error values in such a model were worse than those observed for GBLUP-D in scenarios with large additive and dominance variances. When applied to prescreen important regions, the RF approach detected QTL with high additive and/or dominance effects. Among machine learning methods, only the RF was capable to cover implicitly dominance effects without increasing the number of covariates in the model, resulting in higher accuracies for the total genetic and phenotypic values as the dominance ratio increases. Nevertheless, whether the interest is to infer directly on dominance effects, GBLUP-D could be a more suitable method.
本研究旨在比较基因组最佳线性无偏预测(GBLUP)和机器学习方法(随机森林,RF;支持向量机,SVM;人工神经网络,ANN)在不同显性效应水平的模拟群体中的预测性能。模拟基因组由 50k SNP 和 300 QTL 组成,均为二倍体且随机分布在 29 条常染色体上。总共模拟了六个性状,考虑了不同的狭义和广义遗传力值。在具有低遗传力(h2 = 0.10)的纯加性情况下,使用 GBLUP 获得的预测能力略高于其他方法,而对于遗传力适中(h2 = 0.30)的情况,ANN 提供了最高的准确性。在 GBLUP 扩展到显性效应(GBLUP-D)中,显性偏离预测的准确性从 0.180 到 0.350 不等,在 RF 中从 0.06 到 0.185 不等,而在 ANN 和 SVM 方法中则为零。虽然 RF 对总遗传效应预测的准确性较高,但在具有较大加性和显性方差的情况下,该模型的均方误差值比 GBLUP-D 观察到的更差。当应用于预筛选重要区域时,RF 方法检测到具有高加性和/或显性效应的 QTL。在机器学习方法中,只有 RF 能够在不增加模型中协变量数量的情况下隐含地处理显性效应,从而随着显性比的增加,总遗传和表型值的准确性更高。然而,无论兴趣是直接推断显性效应,GBLUP-D 可能是一种更合适的方法。