Rao Yao, Zhang Lilian, Gao Lutao, Wang Shuran, Yang Linnan
College of Big Data, Yunnan Agricultural University, Kunming 650201, China.
Yunnan Engineering Technology Research Center of Agricultural Big Data, Kunming 650201, China.
Animals (Basel). 2025 Apr 18;15(8):1172. doi: 10.3390/ani15081172.
Machine learning has attracted much attention in the field of genomic prediction due to its powerful predictive capabilities, yet the lack of an explanatory nature in modeling decisions remains a major challenge. In this study, we propose a novel machine learning method, ExAutoGP, which aims to improve the accuracy of genomic prediction and enhance the transparency of the model by combining automated machine learning (AutoML) with SHapley Additive exPlanations (SHAP). To evaluate ExAutoGP's effectiveness, we designed a comparative experiment consisting of a simulated dataset and two real animal datasets. For each dataset, we applied ExAutoGP and five baseline models-Genomic Best Linear Unbiased Prediction (GBLUP), BayesB, Support Vector Regression (SVR), Kernel Ridge Regression (KRR), and Random Forest (RF). All models were trained and evaluated using five repeated five-fold cross-validation, and their performance was assessed based on both predictive accuracy and computational efficiency. The results show that ExAutoGP exhibits robust and excellent prediction performance on all datasets. In addition, the SHAP method not only effectively reveals the decision-making process of ExAutoGP and enhances its interpretability, but also identifies genetic markers closely related to the traits. This study demonstrates the strong potential of AutoML in genomic prediction, while the introduction of SHAP provides actionable biological insights. The synergy of high prediction accuracy and interpretability offers new perspectives for optimizing genomic selection strategies in livestock and poultry breeding.
机器学习因其强大的预测能力在基因组预测领域备受关注,但其建模决策缺乏可解释性仍是一个重大挑战。在本研究中,我们提出了一种新颖的机器学习方法ExAutoGP,旨在通过将自动化机器学习(AutoML)与Shapley值加法解释(SHAP)相结合,提高基因组预测的准确性并增强模型的透明度。为了评估ExAutoGP的有效性,我们设计了一个对比实验,该实验由一个模拟数据集和两个真实动物数据集组成。对于每个数据集,我们应用了ExAutoGP和五个基线模型——基因组最佳线性无偏预测(GBLUP)、贝叶斯B、支持向量回归(SVR)、核岭回归(KRR)和随机森林(RF)。所有模型均使用五次重复的五折交叉验证进行训练和评估,并基于预测准确性和计算效率对其性能进行评估。结果表明,ExAutoGP在所有数据集上均表现出稳健且出色的预测性能。此外,SHAP方法不仅有效地揭示了ExAutoGP的决策过程并增强了其可解释性,还识别出与性状密切相关的遗传标记。本研究证明了AutoML在基因组预测中的强大潜力,而SHAP的引入提供了可操作的生物学见解。高预测准确性和可解释性的协同作用为优化畜禽育种中的基因组选择策略提供了新的视角。