基于环境和气象数据预测番茄果实某些品质性状的XGBoost模型与神经网络模型的比较分析

A Comparative Analysis of XGBoost and Neural Network Models for Predicting Some Tomato Fruit Quality Traits from Environmental and Meteorological Data.

作者信息

M'hamdi Oussama, Takács Sándor, Palotás Gábor, Ilahy Riadh, Helyes Lajos, Pék Zoltán

机构信息

Institute of Horticultural Sciences, Hungarian University of Agriculture and Life Sciences, Páter K. Str. 1, 2100 Gödöllö, Hungary.

Doctoral School of Plant Science, Hungarian University of Agriculture and Life Sciences, Páter K. Str. 1, 2100 Gödöllö, Hungary.

出版信息

Plants (Basel). 2024 Mar 6;13(5):746. doi: 10.3390/plants13050746.

DOI:10.3390/plants13050746

PMID:38475592

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10934895/

Abstract

The tomato as a raw material for processing is globally important and is pivotal in dietary and agronomic research due to its nutritional, economic, and health significance. This study explored the potential of machine learning (ML) for predicting tomato quality, utilizing data from 48 cultivars and 28 locations in Hungary over 5 seasons. It focused on °Brix, lycopene content, and colour (a/b ratio) using extreme gradient boosting (XGBoost) and artificial neural network (ANN) models. The results revealed that XGBoost consistently outperformed ANN, achieving high accuracy in predicting °Brix (R² = 0.98, RMSE = 0.07) and lycopene content (R² = 0.87, RMSE = 0.61), and excelling in colour prediction (a/b ratio) with a R² of 0.93 and RMSE of 0.03. ANN lagged behind particularly in colour prediction, showing a negative R² value of -0.35. Shapley additive explanation's (SHAP) summary plot analysis indicated that both models are effective in predicting °Brix and lycopene content in tomatoes, highlighting different aspects of the data. SHAP analysis highlighted the models' efficiency (especially in °Brix and lycopene predictions) and underscored the significant influence of cultivar choice and environmental factors like climate and soil. These findings emphasize the importance of selecting and fine-tuning the appropriate ML model for enhancing precision agriculture, underlining XGBoost's superiority in handling complex agronomic data for quality assessment.

摘要

番茄作为加工原料在全球具有重要意义，由于其营养、经济和健康方面的重要性，在饮食和农艺研究中起着关键作用。本研究利用匈牙利48个品种和28个地点在5个季节的数据，探索了机器学习（ML）预测番茄品质的潜力。研究使用极端梯度提升（XGBoost）和人工神经网络（ANN）模型，重点关注糖度、番茄红素含量和颜色（a/b比）。结果表明，XGBoost始终优于ANN，在预测糖度（R² = 0.98，RMSE = 0.07）和番茄红素含量（R² = 0.87，RMSE = 0.61）方面具有高精度，在颜色预测（a/b比）方面表现出色，R²为0.93，RMSE为0.03。ANN在颜色预测方面尤其滞后，R²值为-0.35。夏普利加法解释（SHAP）的汇总图分析表明，这两种模型在预测番茄的糖度和番茄红素含量方面都是有效的，突出了数据的不同方面。SHAP分析突出了模型的效率（特别是在糖度和番茄红素预测方面），并强调了品种选择以及气候和土壤等环境因素的重大影响。这些发现强调了选择和微调合适的ML模型以加强精准农业的重要性，突显了XGBoost在处理复杂农艺数据进行质量评估方面的优越性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cbdc/10934895/56ebcbaa14a3/plants-13-00746-g001.jpg

相似文献

A Comparative Analysis of XGBoost and Neural Network Models for Predicting Some Tomato Fruit Quality Traits from Environmental and Meteorological Data.基于环境和气象数据预测番茄果实某些品质性状的XGBoost模型与神经网络模型的比较分析

Plants (Basel). 2024 Mar 6;13(5):746. doi: 10.3390/plants13050746.

Optimized Machine Learning Models for Predicting Core Body Temperature in Dairy Cows: Enhancing Accuracy and Interpretability for Practical Livestock Management.用于预测奶牛核心体温的优化机器学习模型：提高实际畜牧管理的准确性和可解释性

Animals (Basel). 2024 Sep 20;14(18):2724. doi: 10.3390/ani14182724.

Valorization of tomato processing by-products: Predictive modeling and optimization for ultrasound-assisted lycopene extraction.番茄加工副产物的增值利用：超声辅助提取番茄红素的预测建模与优化。

Ultrason Sonochem. 2024 Nov;110:107055. doi: 10.1016/j.ultsonch.2024.107055. Epub 2024 Aug 30.

A comparative study of 11 non-linear regression models highlighting autoencoder, DBN, and SVR, enhanced by SHAP importance analysis in soybean branching prediction.一项对11种非线性回归模型的比较研究，重点关注自动编码器、深度信念网络和支持向量回归，并通过SHAP重要性分析在大豆分枝预测中得到增强。

Sci Rep. 2024 Mar 11;14(1):5905. doi: 10.1038/s41598-024-55243-x.

Seasonal prediction of daily PM concentrations with interpretable machine learning: a case study of Beijing, China.基于可解释机器学习的日 PM 浓度季节性预测：以中国北京为例。

Environ Sci Pollut Res Int. 2022 Jun;29(30):45821-45836. doi: 10.1007/s11356-022-18913-9. Epub 2022 Feb 12.

Artificial intelligence-based prediction of lycopene content in raw tomatoes using physicochemical attributes.基于物理化学属性的人工智能对生番茄中番茄红素含量的预测

Phytochem Anal. 2023 Oct;34(7):729-744. doi: 10.1002/pca.3185. Epub 2022 Nov 11.

Predicting rice phenology across China by integrating crop phenology model and machine learning.通过作物物候模型与机器学习的融合来预测中国各地的水稻物候。

Sci Total Environ. 2024 Nov 15;951:175585. doi: 10.1016/j.scitotenv.2024.175585. Epub 2024 Aug 21.

Prediction of Dichloroethene Concentration in the Groundwater of a Contaminated Site Using XGBoost and LSTM.基于 XGBoost 和 LSTM 的污染场地地下水中二氯乙烷浓度预测。

Int J Environ Res Public Health. 2022 Jul 30;19(15):9374. doi: 10.3390/ijerph19159374.

Interpretable machine learning for predicting 28-day all-cause in-hospital mortality for hypertensive ischemic or hemorrhagic stroke patients in the ICU: a multi-center retrospective cohort study with internal and external cross-validation.用于预测重症监护病房中高血压性缺血性或出血性中风患者28天全因院内死亡率的可解释机器学习：一项具有内部和外部交叉验证的多中心回顾性队列研究

Front Neurol. 2023 Aug 8;14:1185447. doi: 10.3389/fneur.2023.1185447. eCollection 2023.

Comparison of an interpretable extreme gradient boosting model and an artificial neural network model for prediction of severe acute pancreatitis.比较可解释的极端梯度提升模型和人工神经网络模型预测重症急性胰腺炎。

Pol Arch Intern Med. 2024 May 28;134(5). doi: 10.20452/pamw.16700. Epub 2024 Mar 15.

引用本文的文献

Prediction Model of Powdery Mildew Disease Index in Rubber Trees Based on Machine Learning.基于机器学习的橡胶树白粉病病情指数预测模型

Plants (Basel). 2025 Aug 3;14(15):2402. doi: 10.3390/plants14152402.

Machine learning-optimized bioprocess for macroidin production by Lysinibacillus macroides and its biomedical applications.机器学习优化的巨大芽孢杆菌生产大环菌素的生物过程及其生物医学应用

Bioprocess Biosyst Eng. 2025 Jun 4. doi: 10.1007/s00449-025-03183-9.

Machine learning insights into the antioxidant and biomolecular shielding effects of polyphenol-rich 18 date palm pit extracts.机器学习对富含多酚的18种枣椰核提取物的抗氧化和生物分子屏蔽作用的见解。

Food Chem X. 2025 Apr 19;27:102480. doi: 10.1016/j.fochx.2025.102480. eCollection 2025 Apr.

Leveraging machine learning in precision medicine to unveil organochlorine pesticides as predictive biomarkers for thyroid dysfunction.利用精准医学中的机器学习揭示有机氯农药作为甲状腺功能障碍的预测生物标志物。

Sci Rep. 2025 Apr 11;15(1):12501. doi: 10.1038/s41598-025-94827-z.

本文引用的文献

Cross-validation: what does it estimate and how well does it do it?交叉验证：它估计的是什么，效果如何？

J Am Stat Assoc. 2024;119(546):1434-1445. doi: 10.1080/01621459.2023.2197686. Epub 2023 May 15.

Drought-responsive genes in tomato: meta-analysis of gene expression using machine learning.番茄抗旱相关基因：基于机器学习的基因表达的元分析。

Sci Rep. 2023 Nov 8;13(1):19374. doi: 10.1038/s41598-023-45942-2.

High-resolution neural recordings improve the accuracy of speech decoding.高分辨率神经记录提高了语音解码的准确性。

Nat Commun. 2023 Nov 6;14(1):6938. doi: 10.1038/s41467-023-42555-1.

The metabolic changes that effect fruit quality during tomato fruit ripening.影响番茄果实成熟过程中果实品质的代谢变化。

Mol Hortic. 2022 Jan 20;2(1):2. doi: 10.1186/s43897-022-00024-1.

Beyond green and red: unlocking the genetic orchestration of tomato fruit color and pigmentation.超越红绿：揭示番茄果实颜色和色素形成的遗传协调机制。

Funct Integr Genomics. 2023 Jul 15;23(3):243. doi: 10.1007/s10142-023-01162-5.

Enabling interpretable machine learning for biological data with reliability scores.利用可靠性评分实现生物数据的可解释机器学习。

PLoS Comput Biol. 2023 May 26;19(5):e1011175. doi: 10.1371/journal.pcbi.1011175. eCollection 2023 May.

Recurrent neural network modeling of multivariate time series and its application in temperature forecasting.多元时间序列的递归神经网络建模及其在温度预测中的应用。

PLoS One. 2023 May 19;18(5):e0285713. doi: 10.1371/journal.pone.0285713. eCollection 2023.

Climate variability, agricultural technologies adoption, and productivity in rural Nigeria: a plot-level analysis.尼日利亚农村地区的气候变异性、农业技术采用与生产力：地块层面分析

Agric Food Secur. 2023;12(1):7. doi: 10.1186/s40066-023-00411-x. Epub 2023 Apr 13.

Comparison of hierarchical clustering and neural network clustering: an analysis on precision dominance.层次聚类和神经网络聚类的比较：精度优势分析。

Sci Rep. 2023 Apr 6;13(1):5661. doi: 10.1038/s41598-023-32790-3.

Silver lining to a climate crisis in multiple prospects for alleviating crop waterlogging under future climates.在未来气候条件下，缓解作物渍水方面存在多种前景，为气候危机带来一线希望。

Nat Commun. 2023 Feb 10;14(1):765. doi: 10.1038/s41467-023-36129-4.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于环境和气象数据预测番茄果实某些品质性状的XGBoost模型与神经网络模型的比较分析

A Comparative Analysis of XGBoost and Neural Network Models for Predicting Some Tomato Fruit Quality Traits from Environmental and Meteorological Data.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献