Department of Electrical Engineering, Guizhou University, Guiyang 550025, China.
Sensors (Basel). 2023 Feb 6;23(4):1811. doi: 10.3390/s23041811.
Selecting the best planting area for blueberries is an essential issue in agriculture. To better improve the effectiveness of blueberry cultivation, a machine learning-based classification model for blueberry ecological suitability was proposed for the first time and its validation was conducted by using multi-source environmental features data in this paper. The sparrow search algorithm (SSA) was adopted to optimize the CatBoost model and classify the ecological suitability of blueberries based on the selection of data features. Firstly, the Borderline-SMOTE algorithm was used to balance the number of positive and negative samples. The Variance Inflation Factor and information gain methods were applied to filter out the factors affecting the growth of blueberries. Subsequently, the processed data were fed into the CatBoost for training, and the parameters of the CatBoost were optimized to obtain the optimal model using SSA. Finally, the SSA-CatBoost model was adopted to classify the ecological suitability of blueberries and output the suitability types. Taking a study on a blueberry plantation in Majiang County, Guizhou Province, China as an example, the findings demonstrate that the AUC value of the SSA-CatBoost-based blueberry ecological suitability model is 0.921, which is 2.68% higher than that of the CatBoost (AUC = 0.897) and is significantly higher than Logistic Regression (AUC = 0.855), Support Vector Machine (AUC = 0.864), and Random Forest (AUC = 0.875). Furthermore, the ecological suitability of blueberries in Majiang County is mapped according to the classification results of different models. When comparing the actual blueberry cultivation situation in Majiang County, the classification results of the SSA-CatBoost model proposed in this paper matches best with the real blueberry cultivation situation in Majiang County, which is of a high reference value for the selection of blueberry cultivation sites.
选择蓝莓的最佳种植区域是农业中的一个重要问题。为了更好地提高蓝莓种植的效果,本文首次提出了一种基于机器学习的蓝莓生态适宜性分类模型,并利用多源环境特征数据对其进行了验证。采用麻雀搜索算法(SSA)对 CatBoost 模型进行优化,根据数据特征的选择对蓝莓的生态适宜性进行分类。首先,使用边界-SMOTE 算法来平衡正、负样本的数量。使用方差膨胀因子和信息增益方法筛选出影响蓝莓生长的因素。然后,将处理后的数据输入到 CatBoost 中进行训练,并使用 SSA 优化 CatBoost 的参数,以获得最优模型。最后,采用 SSA-CatBoost 模型对蓝莓的生态适宜性进行分类,并输出适宜性类型。以中国贵州省麻江县的一个蓝莓种植园为例,结果表明,基于 SSA-CatBoost 的蓝莓生态适宜性模型的 AUC 值为 0.921,比 CatBoost(AUC = 0.897)高 2.68%,明显高于逻辑回归(AUC = 0.855)、支持向量机(AUC = 0.864)和随机森林(AUC = 0.875)。此外,根据不同模型的分类结果对麻江县蓝莓的生态适宜性进行了映射。当将本文提出的 SSA-CatBoost 模型的分类结果与麻江县实际的蓝莓种植情况进行比较时,该模型的分类结果与麻江县实际的蓝莓种植情况最为吻合,对蓝莓种植地点的选择具有较高的参考价值。