一项对11种非线性回归模型的比较研究，重点关注自动编码器、深度信念网络和支持向量回归，并通过SHAP重要性分析在大豆分枝预测中得到增强。

A comparative study of 11 non-linear regression models highlighting autoencoder, DBN, and SVR, enhanced by SHAP importance analysis in soybean branching prediction.

作者信息

Zhou Wei, Yan Zhengxiao, Zhang Liting

机构信息

Florida Agricultural and Mechanical University, Tallahassee, FL, 32307, USA.

Florida State University, Tallahassee, FL, 32306, USA.

出版信息

Sci Rep. 2024 Mar 11;14(1):5905. doi: 10.1038/s41598-024-55243-x.

DOI:10.1038/s41598-024-55243-x

PMID:38467662

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10928191/

Abstract

To explore a robust tool for advancing digital breeding practices through an artificial intelligence-driven phenotype prediction expert system, we undertook a thorough analysis of 11 non-linear regression models. Our investigation specifically emphasized the significance of Support Vector Regression (SVR) and SHapley Additive exPlanations (SHAP) in predicting soybean branching. By using branching data (phenotype) of 1918 soybean accessions and 42 k SNP (Single Nucleotide Polymorphism) polymorphic data (genotype), this study systematically compared 11 non-linear regression AI models, including four deep learning models (DBN (deep belief network) regression, ANN (artificial neural network) regression, Autoencoders regression, and MLP (multilayer perceptron) regression) and seven machine learning models (e.g., SVR (support vector regression), XGBoost (eXtreme Gradient Boosting) regression, Random Forest regression, LightGBM regression, GPs (Gaussian processes) regression, Decision Tree regression, and Polynomial regression). After being evaluated by four valuation metrics: R (R-squared), MAE (Mean Absolute Error), MSE (Mean Squared Error), and MAPE (Mean Absolute Percentage Error), it was found that the SVR, Polynomial Regression, DBN, and Autoencoder outperformed other models and could obtain a better prediction accuracy when they were used for phenotype prediction. In the assessment of deep learning approaches, we exemplified the SVR model, conducting analyses on feature importance and gene ontology (GO) enrichment to provide comprehensive support. After comprehensively comparing four feature importance algorithms, no notable distinction was observed in the feature importance ranking scores across the four algorithms, namely Variable Ranking, Permutation, SHAP, and Correlation Matrix, but the SHAP value could provide rich information on genes with negative contributions, and SHAP importance was chosen for feature selection. The results of this study offer valuable insights into AI-mediated plant breeding, addressing challenges faced by traditional breeding programs. The method developed has broad applicability in phenotype prediction, minor QTL (quantitative trait loci) mining, and plant smart-breeding systems, contributing significantly to the advancement of AI-based breeding practices and transitioning from experience-based to data-based breeding.

摘要

为了探索一种通过人工智能驱动的表型预测专家系统推进数字育种实践的强大工具，我们对11种非线性回归模型进行了全面分析。我们的研究特别强调了支持向量回归（SVR）和夏普利值（SHAP）在预测大豆分枝方面的重要性。本研究利用1918份大豆种质的分枝数据（表型）和42k单核苷酸多态性（SNP）多态性数据（基因型），系统比较了11种非线性回归人工智能模型，包括四种深度学习模型（深度信念网络（DBN）回归、人工神经网络（ANN）回归、自动编码器回归和多层感知器（MLP）回归）和七种机器学习模型（如支持向量回归（SVR）、极端梯度提升（XGBoost）回归、随机森林回归、轻梯度提升机（LightGBM）回归、高斯过程（GPs）回归、决策树回归和多项式回归）。通过决定系数（R）、平均绝对误差（MAE）、均方误差（MSE）和平均绝对百分比误差（MAPE）这四个评估指标进行评估后，发现SVR、多项式回归、DBN和自动编码器的表现优于其他模型，在用于表型预测时能够获得更好的预测精度。在深度学习方法的评估中，我们以SVR模型为例，对特征重要性和基因本体（GO）富集进行分析，以提供全面支持。在综合比较四种特征重要性算法后，在变量排序、排列、SHAP和相关矩阵这四种算法的特征重要性排名分数中未观察到显著差异，但SHAP值可以提供有关具有负贡献基因的丰富信息，因此选择SHAP重要性进行特征选择。本研究结果为人工智能介导的植物育种提供了有价值的见解，解决了传统育种计划面临的挑战。所开发的方法在表型预测、微效数量性状位点（QTL）挖掘和植物智能育种系统中具有广泛的适用性，为基于人工智能的育种实践的推进以及从经验育种向数据育种的转变做出了重大贡献。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c9c9/10928191/6251f24612c9/41598_2024_55243_Fig1_HTML.jpg

相似文献

A comparative study of 11 non-linear regression models highlighting autoencoder, DBN, and SVR, enhanced by SHAP importance analysis in soybean branching prediction.

Sci Rep. 2024 Mar 11;14(1):5905. doi: 10.1038/s41598-024-55243-x.

Predicting dry matter intake in cattle at scale using gradient boosting regression techniques and Gaussian process boosting regression with Shapley additive explanation explainable artificial intelligence, MLflow, and its containerization.

J Anim Sci. 2025 Jan 4;103. doi: 10.1093/jas/skaf041.

A hybrid approach for modeling bicycle crash frequencies: Integrating random forest based SHAP model with random parameter negative binomial regression model.

Accid Anal Prev. 2024 Dec;208:107778. doi: 10.1016/j.aap.2024.107778. Epub 2024 Sep 16.

Using machine learning models to predict the effects of seasonal fluxes on Plesiomonas shigelloides population density.

Environ Pollut. 2023 Jan 15;317:120734. doi: 10.1016/j.envpol.2022.120734. Epub 2022 Nov 28.

An extensive experimental analysis for heart disease prediction using artificial intelligence techniques.

Sci Rep. 2025 Feb 20;15(1):6132. doi: 10.1038/s41598-025-90530-1.

Predicting egg production rate and egg weight of broiler breeders based on machine learning and Shapley additive explanations.

Poult Sci. 2025 Jan;104(1):104458. doi: 10.1016/j.psj.2024.104458. Epub 2024 Oct 29.

Machine learning-based models for the prediction of breast cancer recurrence risk.

BMC Med Inform Decis Mak. 2023 Nov 29;23(1):276. doi: 10.1186/s12911-023-02377-z.

Enhancing the Predictive Performance of Molecularly Imprinted Polymer-Based Electrochemical Sensors Using a Stacking Regressor Ensemble of Machine Learning Models.

ACS Sens. 2025 Apr 25;10(4):3123-3133. doi: 10.1021/acssensors.5c00364. Epub 2025 Apr 17.

Application of Isokinetic Dynamometry Data in Predicting Gait Deviation Index Using Machine Learning in Stroke Patients: A Cross-Sectional Study.

Sensors (Basel). 2024 Nov 13;24(22):7258. doi: 10.3390/s24227258.

Evaluating Machine Learning and Deep Learning models for predicting Wind Turbine power output from environmental factors.

PLoS One. 2025 Jan 23;20(1):e0317619. doi: 10.1371/journal.pone.0317619. eCollection 2025.

引用本文的文献

A Bio-Inspired Adaptive Probability IVYPSO Algorithm with Adaptive Strategy for Backpropagation Neural Network Optimization in Predicting High-Performance Concrete Strength.

Biomimetics (Basel). 2025 Aug 6;10(8):515. doi: 10.3390/biomimetics10080515.

Predicting water quality index using stacked ensemble regression and SHAP based explainable artificial intelligence.

Sci Rep. 2025 Aug 24;15(1):31139. doi: 10.1038/s41598-025-09463-4.

Machine learning analysis of greenhouse gas sources impacting Africa's food security nexus.

Sci Rep. 2025 Aug 6;15(1):28665. doi: 10.1038/s41598-025-14766-7.

Research on early warning model of coal spontaneous combustion based on interpretability.

Sci Rep. 2025 May 29;15(1):18847. doi: 10.1038/s41598-025-01154-4.

A measurement-based framework integrating machine learning and morphological dynamics for outdoor thermal regulation.

Int J Biometeorol. 2025 Jul;69(7):1645-1662. doi: 10.1007/s00484-025-02921-8. Epub 2025 Apr 21.

Integration of RNAseq transcriptomics and -glycomics reveal biosynthetic pathways and predict structure-specific -glycan expression.

Chem Sci. 2025 Apr 4;16(17):7155-7172. doi: 10.1039/d5sc00467e. eCollection 2025 Apr 30.

Fetal gestational age prediction via shape descriptors of cortical development.

Front Pediatr. 2024 Nov 20;12:1471080. doi: 10.3389/fped.2024.1471080. eCollection 2024.

Platelet Metabolites as Candidate Biomarkers in Sepsis Diagnosis and Management Using the Proposed Explainable Artificial Intelligence Approach.

J Clin Med. 2024 Aug 23;13(17):5002. doi: 10.3390/jcm13175002.

China's progress in synergetic governance of climate change and multiple environmental issues.

PNAS Nexus. 2024 Aug 21;3(9):pgae351. doi: 10.1093/pnasnexus/pgae351. eCollection 2024 Sep.

本文引用的文献

Machine Learning Applied to the Search for Nonlinear Features in Breeding Populations.

Front Artif Intell. 2022 May 20;5:876578. doi: 10.3389/frai.2022.876578. eCollection 2022.

Plant Genotype to Phenotype Prediction Using Machine Learning.

Front Genet. 2022 May 18;13:822173. doi: 10.3389/fgene.2022.822173. eCollection 2022.

Multivariable association discovery in population-scale meta-omics studies.

PLoS Comput Biol. 2021 Nov 16;17(11):e1009442. doi: 10.1371/journal.pcbi.1009442. eCollection 2021 Nov.

Multitrait machine- and deep-learning models for genomic selection using spectral information in a wheat breeding program.

Plant Genome. 2021 Nov;14(3):e20119. doi: 10.1002/tpg2.20119. Epub 2021 Sep 5.

Crop yield prediction integrating genotype and weather variables using deep learning.

PLoS One. 2021 Jun 17;16(6):e0252402. doi: 10.1371/journal.pone.0252402. eCollection 2021.

Application of Machine Learning Algorithms in Plant Breeding: Predicting Yield From Hyperspectral Reflectance in Soybean.

Front Plant Sci. 2021 Jan 12;11:624273. doi: 10.3389/fpls.2020.624273. eCollection 2020.

A review of deep learning applications for genomic selection.

BMC Genomics. 2021 Jan 6;22(1):19. doi: 10.1186/s12864-020-07319-x.

Machine learning approaches for crop improvement: Leveraging phenotypic and genotypic big data.

J Plant Physiol. 2021 Feb;257:153354. doi: 10.1016/j.jplph.2020.153354. Epub 2020 Dec 29.

Machine learning in plant science and plant breeding.

iScience. 2020 Dec 5;24(1):101890. doi: 10.1016/j.isci.2020.101890. eCollection 2021 Jan 22.

An evaluation of machine-learning for predicting phenotype: studies in yeast, rice, and wheat.

Mach Learn. 2020;109(2):251-277. doi: 10.1007/s10994-019-05848-5. Epub 2019 Oct 23.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

一项对11种非线性回归模型的比较研究，重点关注自动编码器、深度信念网络和支持向量回归，并通过SHAP重要性分析在大豆分枝预测中得到增强。

A comparative study of 11 non-linear regression models highlighting autoencoder, DBN, and SVR, enhanced by SHAP importance analysis in soybean branching prediction.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献