M'hamdi Oussama, Takács Sándor, Palotás Gábor, Ilahy Riadh, Helyes Lajos, Pék Zoltán
Institute of Horticultural Sciences, Hungarian University of Agriculture and Life Sciences, Páter K. Str. 1, 2100 Gödöllö, Hungary.
Doctoral School of Plant Science, Hungarian University of Agriculture and Life Sciences, Páter K. Str. 1, 2100 Gödöllö, Hungary.
Plants (Basel). 2024 Mar 6;13(5):746. doi: 10.3390/plants13050746.
The tomato as a raw material for processing is globally important and is pivotal in dietary and agronomic research due to its nutritional, economic, and health significance. This study explored the potential of machine learning (ML) for predicting tomato quality, utilizing data from 48 cultivars and 28 locations in Hungary over 5 seasons. It focused on °Brix, lycopene content, and colour (a/b ratio) using extreme gradient boosting (XGBoost) and artificial neural network (ANN) models. The results revealed that XGBoost consistently outperformed ANN, achieving high accuracy in predicting °Brix (R² = 0.98, RMSE = 0.07) and lycopene content (R² = 0.87, RMSE = 0.61), and excelling in colour prediction (a/b ratio) with a R² of 0.93 and RMSE of 0.03. ANN lagged behind particularly in colour prediction, showing a negative R² value of -0.35. Shapley additive explanation's (SHAP) summary plot analysis indicated that both models are effective in predicting °Brix and lycopene content in tomatoes, highlighting different aspects of the data. SHAP analysis highlighted the models' efficiency (especially in °Brix and lycopene predictions) and underscored the significant influence of cultivar choice and environmental factors like climate and soil. These findings emphasize the importance of selecting and fine-tuning the appropriate ML model for enhancing precision agriculture, underlining XGBoost's superiority in handling complex agronomic data for quality assessment.
番茄作为加工原料在全球具有重要意义,由于其营养、经济和健康方面的重要性,在饮食和农艺研究中起着关键作用。本研究利用匈牙利48个品种和28个地点在5个季节的数据,探索了机器学习(ML)预测番茄品质的潜力。研究使用极端梯度提升(XGBoost)和人工神经网络(ANN)模型,重点关注糖度、番茄红素含量和颜色(a/b比)。结果表明,XGBoost始终优于ANN,在预测糖度(R² = 0.98,RMSE = 0.07)和番茄红素含量(R² = 0.87,RMSE = 0.61)方面具有高精度,在颜色预测(a/b比)方面表现出色,R²为0.93,RMSE为0.03。ANN在颜色预测方面尤其滞后,R²值为-0.35。夏普利加法解释(SHAP)的汇总图分析表明,这两种模型在预测番茄的糖度和番茄红素含量方面都是有效的,突出了数据的不同方面。SHAP分析突出了模型的效率(特别是在糖度和番茄红素预测方面),并强调了品种选择以及气候和土壤等环境因素的重大影响。这些发现强调了选择和微调合适的ML模型以加强精准农业的重要性,突显了XGBoost在处理复杂农艺数据进行质量评估方面的优越性。