Suppr超能文献

一项对11种非线性回归模型的比较研究,重点关注自动编码器、深度信念网络和支持向量回归,并通过SHAP重要性分析在大豆分枝预测中得到增强。

A comparative study of 11 non-linear regression models highlighting autoencoder, DBN, and SVR, enhanced by SHAP importance analysis in soybean branching prediction.

作者信息

Zhou Wei, Yan Zhengxiao, Zhang Liting

机构信息

Florida Agricultural and Mechanical University, Tallahassee, FL, 32307, USA.

Florida State University, Tallahassee, FL, 32306, USA.

出版信息

Sci Rep. 2024 Mar 11;14(1):5905. doi: 10.1038/s41598-024-55243-x.

Abstract

To explore a robust tool for advancing digital breeding practices through an artificial intelligence-driven phenotype prediction expert system, we undertook a thorough analysis of 11 non-linear regression models. Our investigation specifically emphasized the significance of Support Vector Regression (SVR) and SHapley Additive exPlanations (SHAP) in predicting soybean branching. By using branching data (phenotype) of 1918 soybean accessions and 42 k SNP (Single Nucleotide Polymorphism) polymorphic data (genotype), this study systematically compared 11 non-linear regression AI models, including four deep learning models (DBN (deep belief network) regression, ANN (artificial neural network) regression, Autoencoders regression, and MLP (multilayer perceptron) regression) and seven machine learning models (e.g., SVR (support vector regression), XGBoost (eXtreme Gradient Boosting) regression, Random Forest regression, LightGBM regression, GPs (Gaussian processes) regression, Decision Tree regression, and Polynomial regression). After being evaluated by four valuation metrics: R (R-squared), MAE (Mean Absolute Error), MSE (Mean Squared Error), and MAPE (Mean Absolute Percentage Error), it was found that the SVR, Polynomial Regression, DBN, and Autoencoder outperformed other models and could obtain a better prediction accuracy when they were used for phenotype prediction. In the assessment of deep learning approaches, we exemplified the SVR model, conducting analyses on feature importance and gene ontology (GO) enrichment to provide comprehensive support. After comprehensively comparing four feature importance algorithms, no notable distinction was observed in the feature importance ranking scores across the four algorithms, namely Variable Ranking, Permutation, SHAP, and Correlation Matrix, but the SHAP value could provide rich information on genes with negative contributions, and SHAP importance was chosen for feature selection. The results of this study offer valuable insights into AI-mediated plant breeding, addressing challenges faced by traditional breeding programs. The method developed has broad applicability in phenotype prediction, minor QTL (quantitative trait loci) mining, and plant smart-breeding systems, contributing significantly to the advancement of AI-based breeding practices and transitioning from experience-based to data-based breeding.

摘要

为了探索一种通过人工智能驱动的表型预测专家系统推进数字育种实践的强大工具,我们对11种非线性回归模型进行了全面分析。我们的研究特别强调了支持向量回归(SVR)和夏普利值(SHAP)在预测大豆分枝方面的重要性。本研究利用1918份大豆种质的分枝数据(表型)和42k单核苷酸多态性(SNP)多态性数据(基因型),系统比较了11种非线性回归人工智能模型,包括四种深度学习模型(深度信念网络(DBN)回归、人工神经网络(ANN)回归、自动编码器回归和多层感知器(MLP)回归)和七种机器学习模型(如支持向量回归(SVR)、极端梯度提升(XGBoost)回归、随机森林回归、轻梯度提升机(LightGBM)回归、高斯过程(GPs)回归、决策树回归和多项式回归)。通过决定系数(R)、平均绝对误差(MAE)、均方误差(MSE)和平均绝对百分比误差(MAPE)这四个评估指标进行评估后,发现SVR、多项式回归、DBN和自动编码器的表现优于其他模型,在用于表型预测时能够获得更好的预测精度。在深度学习方法的评估中,我们以SVR模型为例,对特征重要性和基因本体(GO)富集进行分析,以提供全面支持。在综合比较四种特征重要性算法后,在变量排序、排列、SHAP和相关矩阵这四种算法的特征重要性排名分数中未观察到显著差异,但SHAP值可以提供有关具有负贡献基因的丰富信息,因此选择SHAP重要性进行特征选择。本研究结果为人工智能介导的植物育种提供了有价值的见解,解决了传统育种计划面临的挑战。所开发的方法在表型预测、微效数量性状位点(QTL)挖掘和植物智能育种系统中具有广泛的适用性,为基于人工智能的育种实践的推进以及从经验育种向数据育种的转变做出了重大贡献。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c9c9/10928191/6251f24612c9/41598_2024_55243_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验