Safanelli José L, Hengl Tomislav, Parente Leandro L, Minarik Robert, Bloom Dellena E, Todd-Brown Katherine, Gholizadeh Asa, Mendes Wanderson de Sousa, Sanderman Jonathan
Woodwell Climate Research Center, Falmouth, MA, United States of America.
OpenGeoHub foundation, Wageningen, the Netherlands.
PLoS One. 2025 Jan 13;20(1):e0296545. doi: 10.1371/journal.pone.0296545. eCollection 2025.
Soil spectroscopy is a widely used method for estimating soil properties that are important to environmental and agricultural monitoring. However, a bottleneck to its more widespread adoption is the need for establishing large reference datasets for training machine learning (ML) models, which are called soil spectral libraries (SSLs). Similarly, the prediction capacity of new samples is also subject to the number and diversity of soil types and conditions represented in the SSLs. To help bridge this gap and enable hundreds of stakeholders to collect more affordable soil data by leveraging a centralized open resource, the Soil Spectroscopy for Global Good initiative has created the Open Soil Spectral Library (OSSL). In this paper, we describe the procedures for collecting and harmonizing several SSLs that are incorporated into the OSSL, followed by exploratory analysis and predictive modeling. The results of 10-fold cross-validation with refitting show that, in general, mid-infrared (MIR)-based models are significantly more accurate than visible and near-infrared (VisNIR) or near-infrared (NIR) models. From independent model evaluation, we found that Cubist comes out as the best-performing ML algorithm for the calibration and delivery of reliable outputs (prediction uncertainty and representation flag). Although many soil properties are well predicted, total sulfur, extractable sodium, and electrical conductivity performed poorly in all spectral regions, with some other extractable nutrients and physical soil properties also performing poorly in one or two spectral regions (VisNIR or NIR). Hence, the use of predictive models based solely on spectral variations has limitations. This study also presents and discusses several other open resources that were developed from the OSSL, aspects of opening data, current limitations, and future development. With this genuinely open science project, we hope that OSSL becomes a driver of the soil spectroscopy community to accelerate the pace of scientific discovery and innovation.
土壤光谱学是一种广泛用于估算对环境和农业监测至关重要的土壤属性的方法。然而,其更广泛应用的一个瓶颈是需要建立大型参考数据集来训练机器学习(ML)模型,即所谓的土壤光谱库(SSLs)。同样,新样本的预测能力也取决于SSLs中所代表的土壤类型和条件的数量与多样性。为了帮助弥合这一差距,并使数百个利益相关者能够通过利用集中式开放资源收集更经济实惠的土壤数据,全球公益土壤光谱学倡议创建了开放土壤光谱库(OSSL)。在本文中,我们描述了收集和整合纳入OSSL的多个SSLs的程序,随后进行探索性分析和预测建模。重新拟合的10折交叉验证结果表明,总体而言,基于中红外(MIR)的模型比可见光和近红外(VisNIR)或近红外(NIR)模型显著更准确。从独立模型评估中,我们发现Cubist是用于校准和提供可靠输出(预测不确定性和表示标志)的最佳性能ML算法。尽管许多土壤属性得到了很好的预测,但全硫、可提取钠和电导率在所有光谱区域的表现都很差,其他一些可提取养分和土壤物理属性在一个或两个光谱区域(VisNIR或NIR)也表现不佳。因此,仅基于光谱变化的预测模型存在局限性。本研究还介绍并讨论了从OSSL开发的其他几个开放资源、数据开放方面、当前局限性和未来发展。通过这个真正的开放科学项目,我们希望OSSL成为土壤光谱学界的驱动力,以加快科学发现和创新的步伐。