Institute of Crop Science and Resource Conservation, University of Bonn, Bonn, 53111, Germany.
Department of Business Analytics, Tippie College of Business, University of Iowa, Iowa, USA.
Sci Rep. 2022 Feb 25;12(1):3215. doi: 10.1038/s41598-022-06249-w.
Crop yield forecasting depends on many interactive factors, including crop genotype, weather, soil, and management practices. This study analyzes the performance of machine learning and deep learning methods for winter wheat yield prediction using an extensive dataset of weather, soil, and crop phenology variables in 271 counties across Germany from 1999 to 2019. We proposed a Convolutional Neural Network (CNN) model, which uses a 1-dimensional convolution operation to capture the time dependencies of environmental variables. We used eight supervised machine learning models as baselines and evaluated their predictive performance using RMSE, MAE, and correlation coefficient metrics to benchmark the yield prediction results. Our findings suggested that nonlinear models such as the proposed CNN, Deep Neural Network (DNN), and XGBoost were more effective in understanding the relationship between the crop yield and input data compared to the linear models. Our proposed CNN model outperformed all other baseline models used for winter wheat yield prediction (7 to 14% lower RMSE, 3 to 15% lower MAE, and 4 to 50% higher correlation coefficient than the best performing baseline across test data). We aggregated soil moisture and meteorological features at the weekly resolution to address the seasonality of the data. We also moved beyond prediction and interpreted the outputs of our proposed CNN model using SHAP and force plots which provided key insights in explaining the yield prediction results (importance of variables by time). We found DUL, wind speed at week ten, and radiation amount at week seven as the most critical features in winter wheat yield prediction.
作物产量预测取决于许多相互作用的因素,包括作物基因型、天气、土壤和管理实践。本研究使用来自 1999 年至 2019 年德国 271 个县的广泛天气、土壤和作物物候变量数据集,分析了机器学习和深度学习方法在冬小麦产量预测中的性能。我们提出了一种卷积神经网络(CNN)模型,该模型使用一维卷积操作来捕获环境变量的时间依赖性。我们使用了八个有监督的机器学习模型作为基准,并使用 RMSE、MAE 和相关系数指标来评估它们的预测性能,以基准测试产量预测结果。我们的研究结果表明,与线性模型相比,非线性模型(如我们提出的 CNN、深度神经网络(DNN)和 XGBoost)在理解作物产量与输入数据之间的关系方面更为有效。我们提出的 CNN 模型在冬小麦产量预测方面优于所有其他基线模型(在测试数据中,RMSE 降低 7%至 14%,MAE 降低 3%至 15%,相关系数提高 4%至 50%)。我们汇总了每周分辨率的土壤湿度和气象特征,以解决数据的季节性问题。我们还超越了预测,并使用 SHAP 和力图对我们提出的 CNN 模型的输出进行了解释,这些图提供了解释产量预测结果的关键见解(按时间划分变量的重要性)。我们发现 DUL、第十周的风速和第七周的辐射量是冬小麦产量预测中最关键的特征。