Sirpa-Poma J W, Satgé F, Resongles E, Pillco-Zolá R, Molina-Carpio J, Flores Colque M G, Ormachea M, Pacheco Mollinedo P, Bonnet M-P
ESPACE-DEV, Univ Montpellier, IRD, Univ Antilles, Univ Guyane, Univ Réunion, 34093 Montpellier, France.
Universidad Mayor de San Andrés, Instituto de Hidráulica e Hidrología, La Paz, Bolivia.
Sensors (Basel). 2023 Nov 22;23(23):9328. doi: 10.3390/s23239328.
Several recent studies have evidenced the relevance of machine-learning for soil salinity mapping using Sentinel-2 reflectance as input data and field soil salinity measurement (i.e., Electrical Conductivity-EC) as the target. As soil EC monitoring is costly and time consuming, most learning databases used for training/validation rely on a limited number of soil samples, which can affect the model consistency. Based on the low soil salinity variation at the Sentinel-2 pixel resolution, this study proposes to increase the learning database's number of observations by assigning the EC value obtained on the sampled pixel to the eight neighboring pixels. The method allowed extending the original learning database made up of 97 field EC measurements (OD) to an enhanced learning database made up of 691 observations (ED). Two classification machine-learning models (i.e., Random Forest-RF and Support Vector Machine-SVM) were trained with both OD and ED to assess the efficiency of the proposed method by comparing the models' outcomes with EC observations not used in the models´ training. The use of ED led to a significant increase in both models' consistency with the overall accuracy of the RF (SVM) model increasing from 0.25 (0.26) when using the OD to 0.77 (0.55) when using ED. This corresponds to an improvement of approximately 208% and 111%, respectively. Besides the improved accuracy reached with the ED database, the results showed that the RF model provided better soil salinity estimations than the SVM model and that feature selection (i.e., Variance Inflation Factor-VIF and/or Genetic Algorithm-GA) increase both models´ reliability, with GA being the most efficient. This study highlights the potential of machine-learning and Sentinel-2 image combination for soil salinity monitoring in a data-scarce context, and shows the importance of both model and features selection for an optimum machine-learning set-up.
最近的几项研究证明了机器学习在利用哨兵2号反射率作为输入数据、田间土壤盐分测量(即电导率-EC)作为目标进行土壤盐分制图方面的相关性。由于土壤EC监测成本高且耗时,大多数用于训练/验证的学习数据库依赖于数量有限的土壤样本,这可能会影响模型的一致性。基于哨兵2号像素分辨率下土壤盐分变化较小的情况,本研究建议通过将采样像素上获得的EC值分配给八个相邻像素来增加学习数据库的观测数量。该方法使由97个田间EC测量值(OD)组成的原始学习数据库扩展为由691个观测值组成的增强学习数据库(ED)。使用OD和ED分别训练了两种分类机器学习模型(即随机森林-RF和支持向量机-SVM),通过将模型结果与未用于模型训练的EC观测值进行比较,来评估所提方法的效率。使用ED使两个模型的一致性都显著提高,RF(SVM)模型的总体准确率从使用OD时的0.25(0.26)提高到使用ED时的0.77(0.55)。这分别对应于约208%和111%的提升。除了使用ED数据库提高了准确率外,结果还表明RF模型比SVM模型提供了更好的土壤盐分估计,并且特征选择(即方差膨胀因子-VIF和/或遗传算法-GA)提高了两个模型的可靠性,其中GA最为有效。本研究突出了机器学习与哨兵2号图像结合在数据稀缺情况下进行土壤盐分监测的潜力,并表明了模型和特征选择对于优化机器学习设置的重要性。