Suppr超能文献

基于改进 CatBoost 的蓝莓生态适宜性分类模型。

An Improved CatBoost-Based Classification Model for Ecological Suitability of Blueberries.

机构信息

Department of Electrical Engineering, Guizhou University, Guiyang 550025, China.

出版信息

Sensors (Basel). 2023 Feb 6;23(4):1811. doi: 10.3390/s23041811.

Abstract

Selecting the best planting area for blueberries is an essential issue in agriculture. To better improve the effectiveness of blueberry cultivation, a machine learning-based classification model for blueberry ecological suitability was proposed for the first time and its validation was conducted by using multi-source environmental features data in this paper. The sparrow search algorithm (SSA) was adopted to optimize the CatBoost model and classify the ecological suitability of blueberries based on the selection of data features. Firstly, the Borderline-SMOTE algorithm was used to balance the number of positive and negative samples. The Variance Inflation Factor and information gain methods were applied to filter out the factors affecting the growth of blueberries. Subsequently, the processed data were fed into the CatBoost for training, and the parameters of the CatBoost were optimized to obtain the optimal model using SSA. Finally, the SSA-CatBoost model was adopted to classify the ecological suitability of blueberries and output the suitability types. Taking a study on a blueberry plantation in Majiang County, Guizhou Province, China as an example, the findings demonstrate that the AUC value of the SSA-CatBoost-based blueberry ecological suitability model is 0.921, which is 2.68% higher than that of the CatBoost (AUC = 0.897) and is significantly higher than Logistic Regression (AUC = 0.855), Support Vector Machine (AUC = 0.864), and Random Forest (AUC = 0.875). Furthermore, the ecological suitability of blueberries in Majiang County is mapped according to the classification results of different models. When comparing the actual blueberry cultivation situation in Majiang County, the classification results of the SSA-CatBoost model proposed in this paper matches best with the real blueberry cultivation situation in Majiang County, which is of a high reference value for the selection of blueberry cultivation sites.

摘要

选择蓝莓的最佳种植区域是农业中的一个重要问题。为了更好地提高蓝莓种植的效果,本文首次提出了一种基于机器学习的蓝莓生态适宜性分类模型,并利用多源环境特征数据对其进行了验证。采用麻雀搜索算法(SSA)对 CatBoost 模型进行优化,根据数据特征的选择对蓝莓的生态适宜性进行分类。首先,使用边界-SMOTE 算法来平衡正、负样本的数量。使用方差膨胀因子和信息增益方法筛选出影响蓝莓生长的因素。然后,将处理后的数据输入到 CatBoost 中进行训练,并使用 SSA 优化 CatBoost 的参数,以获得最优模型。最后,采用 SSA-CatBoost 模型对蓝莓的生态适宜性进行分类,并输出适宜性类型。以中国贵州省麻江县的一个蓝莓种植园为例,结果表明,基于 SSA-CatBoost 的蓝莓生态适宜性模型的 AUC 值为 0.921,比 CatBoost(AUC = 0.897)高 2.68%,明显高于逻辑回归(AUC = 0.855)、支持向量机(AUC = 0.864)和随机森林(AUC = 0.875)。此外,根据不同模型的分类结果对麻江县蓝莓的生态适宜性进行了映射。当将本文提出的 SSA-CatBoost 模型的分类结果与麻江县实际的蓝莓种植情况进行比较时,该模型的分类结果与麻江县实际的蓝莓种植情况最为吻合,对蓝莓种植地点的选择具有较高的参考价值。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fec3/9961688/07c1edcc2d35/sensors-23-01811-g002.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验