School of Earth and Planetary Sciences, Spatial Sciences Discipline, Curtin University, Perth 6102, Australia.
Faculty of Surveying, Mapping and Geographic Information, Hanoi University of Natural Resources and Environment, Hanoi 100000, Vietnam.
Sensors (Basel). 2022 Sep 1;22(17):6609. doi: 10.3390/s22176609.
Machine learning (ML) has been widely used worldwide to develop crop yield forecasting models. However, it is still challenging to identify the most critical features from a dataset. Although either feature selection (FS) or feature extraction (FX) techniques have been employed, no research compares their performances and, more importantly, the benefits of combining both methods. Therefore, this paper proposes a framework that uses non-feature reduction (All-F) as a baseline to investigate the performance of FS, FX, and a combination of both (FSX). The case study employs the vegetation condition index (VCI)/temperature condition index (TCI) to develop 21 rice yield forecasting models for eight sub-regions in Vietnam based on ML methods, namely linear, support vector machine (SVM), decision tree (Tree), artificial neural network (ANN), and Ensemble. The results reveal that FSX takes full advantage of the FS and FX, leading FSX-based models to perform the best in 18 out of 21 models, while 2 (1) for FS-based (FX-based) models. These FXS-, FS-, and FX-based models improve All-F-based models at an average level of 21% and up to 60% in terms of RMSE. Furthermore, 21 of the best models are developed based on Ensemble (13 models), Tree (6 models), linear (1 model), and ANN (1 model). These findings highlight the significant role of FS, FX, and specially FSX coupled with a wide range of ML algorithms (especially Ensemble) for enhancing the accuracy of predicting crop yield.
机器学习(ML)已在全球范围内广泛用于开发作物产量预测模型。然而,从数据集中识别最关键的特征仍然具有挑战性。尽管已经采用了特征选择(FS)或特征提取(FX)技术,但没有研究比较它们的性能,更重要的是,没有研究组合这两种方法的好处。因此,本文提出了一个框架,该框架使用非特征减少(All-F)作为基准来研究 FS、FX 以及两者组合(FSX)的性能。该案例研究使用植被状况指数(VCI)/温度状况指数(TCI),基于 ML 方法为越南的 8 个次区域开发了 21 个水稻产量预测模型,包括线性、支持向量机(SVM)、决策树(Tree)、人工神经网络(ANN)和集成模型。结果表明,FSX 充分利用了 FS 和 FX,使得基于 FSX 的模型在 21 个模型中的 18 个中表现最佳,而基于 FS 的模型(基于 FX 的模型)有 2(1)个。这些基于 FXS、FS 和 FX 的模型在平均水平上提高了 All-F 模型的性能,在 RMSE 方面提高了 21%至 60%。此外,还基于 Ensemble(13 个模型)、Tree(6 个模型)、linear(1 个模型)和 ANN(1 个模型)开发了 21 个最佳模型。这些发现强调了 FS、FX 以及特别是 FSX 与广泛的 ML 算法(特别是 Ensemble)相结合,对于提高作物产量预测的准确性具有重要作用。