Zuo Yanan, Ji Min, Yang Jiutao, Li Zhenjin, Wang Jing
College of Geodesy and Geomatics, Shandong University of Science and Technology, Qingdao, 266590, China.
Shandong Agricultural Technology Extension Center, Jinan, 250000, China.
Sci Rep. 2025 Jul 31;15(1):28036. doi: 10.1038/s41598-025-13067-3.
As a typical pest affecting corn yield and safety, corn borer causes serious economic losses worldwide. Climate warming has intensified the occurrence of pest outbreaks in recent years, but the associated risk has not been precisely assessed or understood. To address this gap, this paper took Shandong Province, China as a case study, and constructed a feature optimization model for the class imbalance problem and a novel risk assessment method to quantify the temporal and spatial distribution of corn borer occurrence risk. Addressing the prevalent issue of class imbalance in pest datasets, a feature optimization model using Borderline-SMOTE to improve the Genetic Algorithm-Random Forest (GA-RF) was constructed, combined with Pearson correlation coefficient to jointly obtain a subset of features that affect corn borer. Subsequently, given the limitations of traditional risk assessment models that easily lose spatial information, by introducing the idea of weighted clustering algorithm, a novel machine learning model was proposed to assess the risk of agricultural pests and diseases. Finally, integrating natural disaster risk theory, this paper achieved an assessment and zoning of corn borer risk in the study area based on hazard, sensitivity, disaster prevention and mitigation capacity, and comprehensive states. The results indicated that compared with the original RF model, the improved feature optimization model achieves increases of 18.64%, 11.12%, and 11.21% in OOB_score, Accuracy, and F1_score, respectively, and outperforms eight other benchmark models. In terms of clustering performance, the weighted K-means clustering algorithm achieves higher Silhouette coefficient by 0.0138 and 0.1885 compared with the weighted agglomerative hierarchical clustering algorithm (weighted AHC) and weighted DBSCAN, respectively, the Calinski-Harabasz index is higher by 3.8017 and 22.4039, and the Davies-Bouldin index is lower by 0.1006 and 0.4889, demonstrating superior clustering results. The spatial zoning results closely align with actual conditions. The risk of corn borer occurrence was concentrated in the southwest and northern areas of Shandong Province, while the risk in the central and southeast areas was relatively low. This research provides a novel approach to agricultural disaster risk assessment and the obtained results can serve as decision support for corn borer prevention and control in Shandong Province.
玉米螟作为影响玉米产量和安全的典型害虫,在全球范围内造成了严重的经济损失。近年来,气候变暖加剧了害虫爆发的发生,但相关风险尚未得到精确评估或理解。为了填补这一空白,本文以中国山东省为例,构建了针对类别不平衡问题的特征优化模型和一种新颖的风险评估方法,以量化玉米螟发生风险的时空分布。针对害虫数据集中普遍存在的类别不平衡问题,构建了一种使用边界合成少数类过采样技术(Borderline-SMOTE)改进遗传算法-随机森林(GA-RF)的特征优化模型,并结合皮尔逊相关系数共同获得影响玉米螟的特征子集。随后,鉴于传统风险评估模型容易丢失空间信息的局限性,通过引入加权聚类算法的思想,提出了一种新颖的机器学习模型来评估农业病虫害风险。最后,结合自然灾害风险理论,本文基于危险性、敏感性、防灾减灾能力和综合状态实现了研究区域内玉米螟风险的评估与分区。结果表明,与原始的随机森林模型相比,改进后的特征优化模型在袋外得分(OOB_score)、准确率和F1得分方面分别提高了18.64%、11.12%和11.21%,并且优于其他八个基准模型。在聚类性能方面,加权K均值聚类算法与加权凝聚层次聚类算法(加权AHC)和加权密度基于空间聚类算法(加权DBSCAN)相比,轮廓系数分别高出0.0138和0.1885,卡林斯基-哈拉巴斯指数高出3.8017和22.4039,戴维斯-布尔丁指数低0.1006和0.4889,显示出优越的聚类结果。空间分区结果与实际情况紧密吻合。玉米螟发生风险集中在山东省西南部和北部地区,而中部和东南部地区风险相对较低。本研究为农业灾害风险评估提供了一种新颖的方法,所得结果可为山东省玉米螟防治提供决策支持。