Meher Prabina Kumar, Pradhan Upendra Kumar, Ray Mrinmoy, Gupta Ajit, Parsad Rajender, Gupta Pushpendra Kumar
Division of Statistical Genetics, ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi 110012, India.
Division of Forecasting and Agricultural Systems Modeling, ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi 110012, India.
G3 (Bethesda). 2025 Sep 3;15(9). doi: 10.1093/g3journal/jkaf150.
This study proposes a weight optimization-based ensemble framework aimed at improving genomic prediction accuracy. It incorporates 8 Bayesian models-BayesA, BayesB, BayesC, BayesBpi, BayesCpi, BayesR, BayesL, and BayesRR in the ensemble framework, where the weight assigned to each model was optimized using genetic algorithm method. The performance of the ensemble model, named EnBayes, was evaluated on 18 datasets from 4 crop species, showing improved prediction accuracy compared to individual Bayesian models. New objective functions were proposed to improve prediction accuracy in terms of both Pearson's correlation coefficient and mean square error. The accuracy of the ensemble model was found to be associated with the number of models considered in the framework, where a few more accurate models achieved similar accuracy as that of more number of less accurate models. Additionally, over-bias and under-bias models also influenced the biasness of the ensemble model's accuracy. The study also explored a meta-learning approach using Bayesian models as base learners and random forest, quantile regression forest, and ridge regression as meta-learners, with the EnBayes model outperforming this approach. While traditional genomic prediction models GBLUP and rrBLUP and machine learning models support vector machine, random forest, extreme gradient boosting, and light gradient boosting were included in the ensemble framework in addition to Bayesian models, the ensemble model achieved higher accuracy as compared to the individual Bayesian, BLUP, and machine learning models. We believe that EnBayes would contribute significantly to ongoing efforts on improving genomic prediction accuracy.
本研究提出了一种基于权重优化的集成框架,旨在提高基因组预测准确性。该框架纳入了8个贝叶斯模型——贝叶斯A、贝叶斯B、贝叶斯C、贝叶斯Bpi、贝叶斯Cpi、贝叶斯R、贝叶斯L和贝叶斯RR,其中每个模型的权重使用遗传算法进行优化。名为EnBayes的集成模型在来自4种作物的18个数据集上进行了评估,与单个贝叶斯模型相比,预测准确性有所提高。提出了新的目标函数,以在皮尔逊相关系数和均方误差方面提高预测准确性。发现集成模型的准确性与框架中考虑的模型数量有关,一些更准确的模型实现了与更多不太准确的模型相似的准确性。此外,过偏和欠偏模型也影响了集成模型准确性的偏差。该研究还探索了一种元学习方法,使用贝叶斯模型作为基学习器,随机森林、分位数回归森林和岭回归作为元学习器,结果表明EnBayes模型优于这种方法。除了贝叶斯模型外,传统的基因组预测模型GBLUP和rrBLUP以及机器学习模型支持向量机、随机森林、极端梯度提升和轻梯度提升也被纳入集成框架,与单个贝叶斯、BLUP和机器学习模型相比,集成模型实现了更高的准确性。我们相信,EnBayes将为正在进行的提高基因组预测准确性的努力做出重大贡献。