Soil and Water Sciences Department, University of Florida, 2181 McCarty Hall, P.O. Box 110290, Gainesville, FL 32611, USA.
Sensors (Basel). 2022 Apr 21;22(9):3187. doi: 10.3390/s22093187.
The United States NRCS has a soil database that has data collected from across the country over the last several decades. This also includes soil spectral scans. This data is available, but it may not be used to its full potential. For this study, pedon, horizon and spectral data was extracted from the database for samples collected from 2011 to 2015. Only sites that had been fully described and horizons that had been analyzed for the full suite of desired properties were used. This resulted in over 14,000 samples that were used for modeling and eight soil properties: soil organic carbon (SOC); total nitrogen (TN); total sulfur (TS); clay; sand; exchangeable calcium (Ca); cation exchange capacity (CEC); and pH. Four chemometric methods were employed for soil property prediction: partial least squares (PLSR); Random Forest (RF); Cubist; and multivariable adaptive regression splines (MARS). The dataset was sufficiently large that only random subsetting was used to create calibration (70%) and validation (30%) sets. SOC, TN, and TS had the strongest prediction results, with an R value of over 0.9. Ca, CEC and pH were predicted moderately well. Clay and sand models had slightly lower performance. Of the four methods, Cubist produced the strongest models, while PLSR produced the weakest. This may be due to the complex relationships between soil properties and spectra that PLSR could not capture. The only drawback of Cubist is the difficult interpretability of variable importance. Future research should include the use of environmental variables to improve prediction results. Future work may also avoid the use of PLSR when dealing with large datasets that cover large areas and have high degrees of variability.
美国自然资源保护局(NRCS)拥有一个土壤数据库,其中包含过去几十年全国各地收集的数据,包括土壤光谱扫描数据。这些数据是可用的,但可能尚未充分利用。在这项研究中,从数据库中提取了 2011 年至 2015 年采集的样本的个体、层次和光谱数据。仅使用了那些进行了全面描述且对全套所需特性进行了分析的地点。这导致了超过 14000 个样本被用于建模,以及八个土壤特性:土壤有机碳(SOC);总氮(TN);总硫(TS);粘土;沙子;可交换钙(Ca);阳离子交换量(CEC);pH 值。采用了四种化学计量方法进行土壤特性预测:偏最小二乘法(PLSR);随机森林(RF);Cubist;和多变量自适应回归样条(MARS)。数据集足够大,仅使用随机子集来创建校准(70%)和验证(30%)集。SOC、TN 和 TS 的预测结果最强,R 值超过 0.9。Ca、CEC 和 pH 值预测结果适中。粘土和沙子模型的性能略低。在这四种方法中,Cubist 产生的模型最强,而 PLSR 产生的模型最弱。这可能是由于 PLSR 无法捕捉到土壤特性和光谱之间的复杂关系。Cubist 的唯一缺点是变量重要性的解释困难。未来的研究应该包括使用环境变量来提高预测结果。未来的工作还可能避免在处理覆盖大面积和具有高度可变性的大型数据集时使用 PLSR。