Mishra Umakant, Gautam Sagar, Riley William J, Hoffman Forrest M
Bioscience Division, Sandia National Laboratory, Livermore, CA, United States.
Earth and Environmental Sciences, Lawrence Berkeley National Lab, Berkeley, CA, United States.
Front Big Data. 2020 Oct 28;3:528441. doi: 10.3389/fdata.2020.528441. eCollection 2020.
Various approaches of differing mathematical complexities are being applied for spatial prediction of soil properties. Regression kriging is a widely used hybrid approach of spatial variation that combines correlation between soil properties and environmental factors with spatial autocorrelation between soil observations. In this study, we compared four machine learning approaches (gradient boosting machine, multinarrative adaptive regression spline, random forest, and support vector machine) with regression kriging to predict the spatial variation of surface (0-30 cm) soil organic carbon (SOC) stocks at 250-m spatial resolution across the northern circumpolar permafrost region. We combined 2,374 soil profile observations (calibration datasets) with georeferenced datasets of environmental factors (climate, topography, land cover, bedrock geology, and soil types) to predict the spatial variation of surface SOC stocks. We evaluated the prediction accuracy at randomly selected sites (validation datasets) across the study area. We found that different techniques inferred different numbers of environmental factors and their relative importance for prediction of SOC stocks. Regression kriging produced lower prediction errors in comparison to multinarrative adaptive regression spline and support vector machine, and comparable prediction accuracy to gradient boosting machine and random forest. However, the ensemble median prediction of SOC stocks obtained from all four machine learning techniques showed highest prediction accuracy. Although the use of different approaches in spatial prediction of soil properties will depend on the availability of soil and environmental datasets and computational resources, we conclude that the ensemble median prediction obtained from multiple machine learning approaches provides greater spatial details and produces the highest prediction accuracy. Thus an ensemble prediction approach can be a better choice than any single prediction technique for predicting the spatial variation of SOC stocks.
不同数学复杂度的各种方法正被应用于土壤属性的空间预测。回归克里金法是一种广泛使用的空间变异混合方法,它将土壤属性与环境因素之间的相关性与土壤观测值之间的空间自相关性相结合。在本研究中,我们将四种机器学习方法(梯度提升机、多变量自适应回归样条、随机森林和支持向量机)与回归克里金法进行比较,以预测整个北极圈多年冻土区250米空间分辨率下表层(0 - 30厘米)土壤有机碳(SOC)储量的空间变异。我们将2374个土壤剖面观测值(校准数据集)与环境因素(气候、地形、土地覆盖、基岩地质和土壤类型)的地理参考数据集相结合,以预测表层SOC储量的空间变异。我们在研究区域内随机选择的地点(验证数据集)评估预测准确性。我们发现不同技术推断出不同数量的环境因素及其对SOC储量预测的相对重要性。与多变量自适应回归样条和支持向量机相比,回归克里金法产生的预测误差更低,与梯度提升机和随机森林的预测准确性相当。然而,从所有四种机器学习技术获得的SOC储量总体中位数预测显示出最高的预测准确性。虽然在土壤属性空间预测中使用不同方法将取决于土壤和环境数据集的可用性以及计算资源,但我们得出结论,从多种机器学习方法获得的总体中位数预测提供了更详细的空间信息,并产生了最高的预测准确性。因此,对于预测SOC储量的空间变异,总体预测方法可能比任何单一预测技术是更好的选择。