Sun Xiaoyun, Su Shuaiming, Wang Qiang, Xiong Shufeng, Li Yanting, Peng Hong, Shi Lei
College of Information and Management Science, Henan Agriculture University, Zhengzhou, Henan, China.
College of Computer Science and Technology, Zhengzhou University of Light Industry, Zhengzhou, Henan, China.
PeerJ Comput Sci. 2025 Mar 31;11:e2638. doi: 10.7717/peerj-cs.2638. eCollection 2025.
Fusarium head blight (FHB) is a destructive disease which adversely affects the yield of wheat. The occurrence and epidemic of wheat FHB are closely related to meteorological information. Firstly, by analyzing eight meteorological factors-rainfall (RAIN), average sunshine hours (ASH), average wind speed (AWS), average temperature (AT), highest temperature (HT), lowest temperature (LT), average relative humidity (ARH), and maximum temperature difference (MTD)-specific periods closely related to wheat FHB severity are identified. Based on this, a dataset for wheat FHB severity is constructed. After that, the wheat FHB severity levels are divided into four levels, and actual field data shows that the proportion of data for the high prevalence severity level is relatively small. To address data imbalance, the K-means-synthetic minority over-sampling technique (K-means-SMOTE) method is introduced to increase samples of underrepresented severity levels. Subsequently, a wheat FHB severity prediction model based on K-means-SMOTE and extreme gradient boosting (XGBoost) is constructed. Lastly, by combining the rankings of meteorological factors provided by the model and the biological characteristics of wheat FHB, the number of meteorological factors is reduced from eight to four (AWS 4.24-4.28, RAIN 4.5-4.19, ARH 4.12-4.16, LT 4.19-4.23), the accuracy and recall of the model remained unchanged at 0.8936, the F1 score increased from 0.8851 to 0.8898, and the precision decreased from 0.9249 to 0.9058. Although the precision has slightly decreased, most of the other evaluation indicators of the model remain unchanged or have improved, therefore the model is considered effective. Finally, comparative experiments with eight other models demonstrate the superiority of this approach.
小麦赤霉病(FHB)是一种破坏性病害,对小麦产量产生不利影响。小麦赤霉病的发生和流行与气象信息密切相关。首先,通过分析八个气象因素——降雨量(RAIN)、平均日照时数(ASH)、平均风速(AWS)、平均温度(AT)、最高温度(HT)、最低温度(LT)、平均相对湿度(ARH)和最大温差(MTD)——确定与小麦赤霉病严重程度密切相关的特定时期。在此基础上,构建了小麦赤霉病严重程度数据集。之后,将小麦赤霉病严重程度水平分为四个等级,实际田间数据表明高流行严重程度等级的数据比例相对较小。为了解决数据不平衡问题,引入了K均值合成少数过采样技术(K-means-SMOTE)方法来增加代表性不足的严重程度等级的样本。随后,构建了基于K-means-SMOTE和极端梯度提升(XGBoost)的小麦赤霉病严重程度预测模型。最后,结合模型提供的气象因素排名和小麦赤霉病的生物学特性,将气象因素数量从八个减少到四个(AWS 4.24 - 4.28、RAIN 4.5 - 4.19、ARH 4.12 - 4.16、LT 4.19 - 4.23),模型的准确率和召回率保持在0.8936不变,F1分数从0.8851提高到0.8898,精确率从0.9249降至0.9058。虽然精确率略有下降,但模型的其他大多数评估指标保持不变或有所改善,因此该模型被认为是有效的。最后,与其他八个模型的对比实验证明了该方法的优越性。