Suppr超能文献

利用堆叠集成学习增强阿拉比卡咖啡的基因组预测

Enhancing genomic prediction with Stacking Ensemble Learning in Arabica Coffee.

作者信息

Nascimento Moyses, Nascimento Ana Carolina Campana, Azevedo Camila Ferreira, de Oliveira Antonio Carlos Baiao, Caixeta Eveline Teixeira, Jarquin Diego

机构信息

Laboratory of Intelligence Computational and Statistical Learning (LICAE), Department of Statistics, Federal University of Viçosa, Viçosa, Brazil.

Agronomy Department, University of Florida, Gainesville, FL, United States.

出版信息

Front Plant Sci. 2024 Jul 17;15:1373318. doi: 10.3389/fpls.2024.1373318. eCollection 2024.

Abstract

Coffee Breeding programs have traditionally relied on observing plant characteristics over years, a slow and costly process. Genomic selection (GS) offers a DNA-based alternative for faster selection of superior cultivars. Stacking Ensemble Learning (SEL) combines multiple models for potentially even more accurate selection. This study explores SEL potential in coffee breeding, aiming to improve prediction accuracy for important traits [yield (YL), total number of the fruits (NF), leaf miner infestation (LM), and cercosporiosis incidence (Cer)] in Coffea Arabica. We analyzed data from 195 individuals genotyped for 21,211 single-nucleotide polymorphism (SNP) markers. To comprehensively assess model performance, we employed a cross-validation (CV) scheme. Genomic Best Linear Unbiased Prediction (GBLUP), multivariate adaptive regression splines (MARS), Quantile Random Forest (QRF), and Random Forest (RF) served as base learners. For the meta-learner within the SEL framework, various options were explored, including Ridge Regression, RF, GBLUP, and Single Average. The SEL method was able to predict the predictive ability (PA) of important traits in Coffea Arabica. SEL presented higher PA compared with those obtained for all base learner methods. The gains in PA in relation to GBLUP were 87.44% (the ratio between the PA obtained from best Stacking model and the GBLUP), 37.83%, 199.82%, and 14.59% for YL, NF, LM and Cer, respectively. Overall, SEL presents a promising approach for GS. By combining predictions from multiple models, SEL can potentially enhance the PA of GS for complex traits.

摘要

传统上,咖啡育种计划依赖于多年来对植物特征的观察,这是一个缓慢且成本高昂的过程。基因组选择(GS)提供了一种基于DNA的替代方法,可更快地选择优良品种。堆叠集成学习(SEL)结合了多个模型,有可能实现更准确的选择。本研究探索了SEL在咖啡育种中的潜力,旨在提高对阿拉比卡咖啡重要性状[产量(YL)、果实总数(NF)、潜叶蛾侵染(LM)和尾孢菌病发病率(Cer)]的预测准确性。我们分析了195个个体的数据,这些个体针对21211个单核苷酸多态性(SNP)标记进行了基因分型。为了全面评估模型性能,我们采用了交叉验证(CV)方案。基因组最佳线性无偏预测(GBLUP)、多元自适应回归样条(MARS)、分位数随机森林(QRF)和随机森林(RF)作为基础学习器。对于SEL框架内的元学习器,探索了各种选项,包括岭回归、RF、GBLUP和单均值。SEL方法能够预测阿拉比卡咖啡重要性状的预测能力(PA)。与所有基础学习器方法相比,SEL呈现出更高的PA。与GBLUP相比,YL、NF、LM和Cer的PA增益分别为87.44%(最佳堆叠模型获得的PA与GBLUP之间的比率)、37.83%、199.82%和14.59%。总体而言,SEL为GS提供了一种有前景的方法。通过结合多个模型的预测,SEL有可能提高GS对复杂性状的PA。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2cae/11288849/2b11dd838195/fpls-15-1373318-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验