Khan Yunish, Kumar Vinod, Gacem Amel, Satpathi Anurag, Setiya Parul, Surbhi Kumari, Nain Ajeet Singh, Vishwakarma Dinesh Kumar, Obaidullah Ahmad J, Yadav Krishna Kumar, Kisi Ozgur
Department of Mathematics, Statistics and Computer Science, College of Basic Science and Humanities, G.B. Pant University of Agriculture and Technology, Pantnagar, Uttarakhand, 263145, India.
Department of Physics, Faculty of Sciences, University 20 Août 1955, Skikda, Algeria.
Sci Rep. 2025 May 6;15(1):15790. doi: 10.1038/s41598-025-99427-5.
Forecasting the severity of crop diseases is crucial for agricultural productivity and can be achieved through statistical and machine learning techniques. Predictive models that consider weather conditions during critical growth stages of crops have shown promising accuracy. However, selecting the most suitable forecasting model remains a challenge. This research investigates the impact of various weather factors on Soybean Yellow Mosaic Virus (SYMV) incidence. Specifically, six multivariate models Stepwise Multiple Linear Regression (SMLR), Artificial Neural Networks (ANN), Least Absolute Shrinkage and Selection Operator (LASSO), Ridge Regression (RR), Elastic Net (ELNET), and SMLR_ANN both direct and with Principal Component Analysis (PCA)-were developed using 20 years of data (2001 to 2020) to predict the severity of soybean disease in Pantnagar, Uttarakhand. The dataset was divided into two parts, with 80% used for calibration and the remaining 20% for validation. Model accuracy was evaluated using several statistical criteria, including R², RMSE, nRMSE, MAE, PE, and EF. The results indicated that the PCA-SMLR-ANN (nRMSE = 0.76%) model was the most effective predictor of soybean disease severity, closely followed by the PCA-ANN (nRMSE = 3.67%) model. Hybrid models such as PCA-SMLR-ANN and PCA-ANN outperformed individual models like SMLR (nRMSE = 47.72%) and ANN (nRMSE = 6.82%). The performance ranking of the models is as follows: PCA-SMLR-ANN ≈ PCA-ANN ≈ SMLR-ANN ≈ ANN > PCA-ELNET > PCA-Ridge > ELNET ≈ RR > PCA-LASSO > LASSO > PCA-SMLR ≈ SMLR. These findings highlight the superior efficiency of hybrid models in predicting soybean disease severity based on weather indices in the study region.
预测作物病害的严重程度对农业生产力至关重要,并且可以通过统计和机器学习技术来实现。考虑作物关键生长阶段天气状况的预测模型已显示出有前景的准确性。然而,选择最合适的预测模型仍然是一项挑战。本研究调查了各种天气因素对大豆黄花叶病毒(SYMV)发病率的影响。具体而言,利用20年(2001年至2020年)的数据开发了六个多变量模型,即逐步多元线性回归(SMLR)、人工神经网络(ANN)、最小绝对收缩和选择算子(LASSO)、岭回归(RR)、弹性网络(ELNET)以及直接和带有主成分分析(PCA)的SMLR_ANN,以预测北阿坎德邦潘特纳加的大豆病害严重程度。数据集被分为两部分,80%用于校准,其余20%用于验证。使用包括R²、RMSE、nRMSE、MAE、PE和EF在内的几个统计标准评估模型准确性。结果表明,PCA - SMLR - ANN(nRMSE = 0.76%)模型是大豆病害严重程度最有效的预测器,紧随其后的是PCA - ANN(nRMSE = 3.67%)模型。像PCA - SMLR - ANN和PCA - ANN这样的混合模型优于像SMLR(nRMSE = 47.72%)和ANN(nRMSE = 6.82%)这样的单个模型。模型的性能排名如下:PCA - SMLR - ANN ≈ PCA - ANN ≈ SMLR - ANN ≈ ANN > PCA - ELNET > PCA - Ridge > ELNET ≈ RR > PCA - LASSO > LASSO > PCA - SMLR ≈ SMLR。这些发现突出了混合模型在基于研究区域天气指数预测大豆病害严重程度方面的卓越效率。