Laboratoire de Géosimulation Environnementale (LEDGE), Département de Géographie, Université de Montréal, 1375 Avenue Thérèse-Lavoie-Roux, Montréal, QC H2V 0B3, Canada.
Sci Total Environ. 2024 Nov 15;951:175764. doi: 10.1016/j.scitotenv.2024.175764. Epub 2024 Aug 23.
Accurate crop yield predictions are crucial for farmers and policymakers. Despite the widespread use of ensemble machine learning (ML) models in computer science, their application in crop yield prediction remains relatively underexplored. This study, conducted in Canada, aims to assess the potential of five distinct ensemble ML models-Adaptive Boosting (AdaBoost), Gradient Boosting Machine (GBM), XGBoost, LightGBM, and Random Forest (RF)-in predicting crop yields chosen for their ability to manage complex datasets and their strong performance potential. The study integrated various factors, including climate variables, satellite-derived vegetation indices, soil characteristics, and honeybee census data. Data preparation comprised two main steps: first, climate variables were interpolated and averaged for croplands in ArcGIS Pro, along with averaging vegetation indices and soil characteristics. Honeybee census data was also incorporated. Second, the data was organized in Python to create a structured format for models' input. The models' accuracy was assessed using Root Mean Squared Error (RMSE), R-squared, and Mean Absolute Error (MAE). XGBoost emerged as the most accurate model, with the lowest MAE (68.70 for canola and 39.47 for soybeans), lowest RMSE (119.48 for canola and 102.39 for soybeans), and highest R-squared values (0.95 for canola and 0.96 for soybeans) on the test dataset. The study also assessed crop yields under various climate change scenarios, finding minimal variations across the scenarios, but significant negative impacts on canola and soybean yields across Canada. Honeybee colonies were identified as the most influential factor on crop yields, contributing 52.34 % to canola and 57.18 % to soybean yields. This research provides detailed crop yield maps of canola and soybeans at the Census Consolidated Subdivisions (CCS) level across Canada's agricultural landscape, offering valuable forecasts for localized decision-making. Additionally, it offers a proactive strategy for climate change preparedness, assisting farmers and stakeholders optimise resource allocation and manage risks effectively.
准确的作物产量预测对农民和政策制定者至关重要。尽管集成机器学习(ML)模型在计算机科学中得到了广泛应用,但它们在作物产量预测中的应用仍相对较少。本研究在加拿大进行,旨在评估五种不同的集成 ML 模型——自适应提升(AdaBoost)、梯度提升机(GBM)、XGBoost、LightGBM 和随机森林(RF)——在预测作物产量方面的潜力,选择它们是因为它们能够处理复杂数据集,并且具有强大的性能潜力。该研究整合了各种因素,包括气候变量、卫星衍生植被指数、土壤特性和蜜蜂普查数据。数据准备包括两个主要步骤:首先,在 ArcGIS Pro 中对农田进行气候变量插值和平均处理,同时对植被指数和土壤特性进行平均处理。还整合了蜜蜂普查数据。其次,在 Python 中组织数据,为模型的输入创建结构化格式。使用均方根误差(RMSE)、R 平方和平均绝对误差(MAE)评估模型的准确性。XGBoost 是最准确的模型,其测试数据集上的 MAE(油菜为 68.70,大豆为 39.47)、RMSE(油菜为 119.48,大豆为 102.39)和 R 平方值(油菜为 0.95,大豆为 0.96)最低。该研究还评估了在各种气候变化情景下的作物产量,发现情景之间的变化很小,但对加拿大各地的油菜和大豆产量有显著的负面影响。蜜蜂种群被确定为对作物产量影响最大的因素,对油菜产量的贡献为 52.34%,对大豆产量的贡献为 57.18%。本研究提供了加拿大农业景观中油菜和大豆的详细作物产量图,在普查合并分区(CCS)级别提供了有价值的局部决策预测。此外,它还提供了一种积极的气候变化准备策略,帮助农民和利益相关者优化资源分配并有效管理风险。