• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

寻找一种用于从红外光谱预测土壤性质的最佳采样算法。

In search of an optimum sampling algorithm for prediction of soil properties from infrared spectra.

作者信息

Ng Wartini, Minasny Budiman, Malone Brendan, Filippi Patrick

机构信息

Faculty of Science: School of Life and Environmental Sciences, University of Sydney, Sydney, New South Wales, Australia.

出版信息

PeerJ. 2018 Oct 3;6:e5722. doi: 10.7717/peerj.5722. eCollection 2018.

DOI:10.7717/peerj.5722
PMID:30310751
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6173947/
Abstract

BACKGROUND

The use of visible-near infrared (vis-NIR) spectroscopy for rapid soil characterisation has gained a lot of interest in recent times. Soil spectra absorbance from the visible-infrared range can be calibrated using regression models to predict a set of soil properties. The accuracy of these regression models relies heavily on the calibration set. The optimum sample size and the overall sample representativeness of the dataset could further improve the model performance. However, there is no guideline on which sampling method should be used under different size of datasets.

METHODS

Here, we show different sampling algorithms performed differently under different data size and different regression models (Cubist regression tree and Partial Least Square Regression (PLSR)). We analysed the effect of three sampling algorithms: Kennard-Stone (KS), conditioned Latin Hypercube Sampling (cLHS) and k-means clustering (KM) against random sampling on the prediction of up to five different soil properties (sand, clay, carbon content, cation exchange capacity and pH) on three datasets. These datasets have different coverages: a European continental dataset (LUCAS, = 5,639), a regional dataset from Australia (Geeves, = 379), and a local dataset from New South Wales, Australia (Hillston, = 384). Calibration sample sizes ranging from 50 to 3,000 were derived and tested for the continental dataset; and from 50 to 200 samples for the regional and local datasets.

RESULTS

Overall, the PLSR gives a better prediction in comparison to the Cubist model for the prediction of various soil properties. It is also less prone to the choice of sampling algorithm. The KM algorithm is more representative in the larger dataset up to a certain calibration sample size. The KS algorithm appears to be more efficient (as compared to random sampling) in small datasets; however, the prediction performance varied a lot between soil properties. The cLHS sampling algorithm is the most robust sampling method for multiple soil properties regardless of the sample size.

DISCUSSION

Our results suggested that the optimum calibration sample size relied on how much generalization the model had to create. The use of the sampling algorithm is beneficial for larger datasets than smaller datasets where only small improvements can be made. KM is suitable for large datasets, KS is efficient in small datasets but results can be variable, while cLHS is less affected by sample size.

摘要

背景

近年来,可见 - 近红外(vis - NIR)光谱技术用于快速土壤特性表征受到了广泛关注。利用回归模型可以对土壤在可见 - 红外波段的光谱吸光度进行校准,从而预测一系列土壤特性。这些回归模型的准确性在很大程度上依赖于校准集。数据集的最佳样本量和整体样本代表性能够进一步提升模型性能。然而,对于不同规模的数据集应采用何种抽样方法,目前尚无指导原则。

方法

在此,我们展示了不同的抽样算法在不同数据规模和不同回归模型(Cubist回归树和偏最小二乘回归(PLSR))下表现各异。我们分析了三种抽样算法:肯纳德 - 斯通(KS)算法、条件拉丁超立方抽样(cLHS)算法和k均值聚类(KM)算法,与随机抽样相比,它们对三个数据集上多达五种不同土壤特性(砂、黏土、碳含量、阳离子交换容量和pH值)的预测效果。这些数据集具有不同的覆盖范围:一个欧洲大陆数据集(LUCAS,n = 5639)、一个来自澳大利亚的区域数据集(Geeves,n = 379)以及一个来自澳大利亚新南威尔士州的本地数据集(Hillston,n = 384)。针对大陆数据集,推导并测试了校准样本量从50到3000的情况;针对区域和本地数据集,校准样本量范围为50到200个样本。

结果

总体而言,在预测各种土壤特性方面,与Cubist模型相比,PLSR给出了更好的预测结果。它对抽样算法的选择也不太敏感。在达到一定校准样本量之前,KM算法在较大数据集中更具代表性。在小数据集中,KS算法(与随机抽样相比)似乎更高效;然而,不同土壤特性之间的预测性能差异很大。无论样本量大小,cLHS抽样算法对于多种土壤特性而言是最稳健的抽样方法。

讨论

我们的结果表明,最佳校准样本量取决于模型需要进行多大程度的泛化。与较小数据集相比,抽样算法对较大数据集更有益,因为在较小数据集中只能实现较小的改进。KM算法适用于大数据集,KS算法在小数据集中效率较高,但结果可能不稳定,而cLHS受样本量影响较小。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/909e/6173947/fc31d5554f2f/peerj-06-5722-g013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/909e/6173947/0abb65c9dbe1/peerj-06-5722-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/909e/6173947/e58f63ce21de/peerj-06-5722-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/909e/6173947/7606102c1b66/peerj-06-5722-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/909e/6173947/12fa6ff379ea/peerj-06-5722-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/909e/6173947/f8f9d0a108aa/peerj-06-5722-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/909e/6173947/9af0bb989b7d/peerj-06-5722-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/909e/6173947/4c1ca9f2b45a/peerj-06-5722-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/909e/6173947/def47aaae62d/peerj-06-5722-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/909e/6173947/fe9ffd6947c9/peerj-06-5722-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/909e/6173947/91596a705441/peerj-06-5722-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/909e/6173947/7c59cddf5c1c/peerj-06-5722-g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/909e/6173947/a356c50219a5/peerj-06-5722-g012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/909e/6173947/fc31d5554f2f/peerj-06-5722-g013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/909e/6173947/0abb65c9dbe1/peerj-06-5722-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/909e/6173947/e58f63ce21de/peerj-06-5722-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/909e/6173947/7606102c1b66/peerj-06-5722-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/909e/6173947/12fa6ff379ea/peerj-06-5722-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/909e/6173947/f8f9d0a108aa/peerj-06-5722-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/909e/6173947/9af0bb989b7d/peerj-06-5722-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/909e/6173947/4c1ca9f2b45a/peerj-06-5722-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/909e/6173947/def47aaae62d/peerj-06-5722-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/909e/6173947/fe9ffd6947c9/peerj-06-5722-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/909e/6173947/91596a705441/peerj-06-5722-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/909e/6173947/7c59cddf5c1c/peerj-06-5722-g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/909e/6173947/a356c50219a5/peerj-06-5722-g012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/909e/6173947/fc31d5554f2f/peerj-06-5722-g013.jpg

相似文献

1
In search of an optimum sampling algorithm for prediction of soil properties from infrared spectra.寻找一种用于从红外光谱预测土壤性质的最佳采样算法。
PeerJ. 2018 Oct 3;6:e5722. doi: 10.7717/peerj.5722. eCollection 2018.
2
Predicting Soil Properties and Interpreting Vis-NIR Models from across Continental United States.预测美国大陆各地的土壤属性和解释可见-近红外模型。
Sensors (Basel). 2022 Apr 21;22(9):3187. doi: 10.3390/s22093187.
3
Calibration models database of near infrared spectroscopy to predict agricultural soil fertility properties.用于预测农业土壤肥力特性的近红外光谱校准模型数据库。
Data Brief. 2020 Apr 8;30:105469. doi: 10.1016/j.dib.2020.105469. eCollection 2020 Jun.
4
Comparing the effect of different sample conditions and spectral libraries on the prediction accuracy of soil properties from near- and mid-infrared spectra at the field-scale.比较不同样品条件和光谱库对田间尺度下近红外和中红外光谱预测土壤性质准确性的影响。
Soil Tillage Res. 2022 Jan;215:105196. doi: 10.1016/j.still.2021.105196.
5
Combining Laser-Induced Breakdown Spectroscopy and Visible Near-Infrared Spectroscopy for Predicting Soil Organic Carbon and Texture: A Danish National-Scale Study.结合激光诱导击穿光谱和可见近红外光谱预测土壤有机碳和质地:一项丹麦全国范围的研究
Sensors (Basel). 2024 Jul 10;24(14):4464. doi: 10.3390/s24144464.
6
Evaluation of Two Portable Hyperspectral-Sensor-Based Instruments to Predict Key Soil Properties in Canadian Soils.评价两种基于便携式高光谱传感器的仪器对加拿大土壤关键土壤特性的预测能力。
Sensors (Basel). 2022 Mar 26;22(7):2556. doi: 10.3390/s22072556.
7
Application of Low-Cost MEMS Spectrometers for Forest Topsoil Properties Prediction.低成本微机电系统(MEMS)光谱仪在林地表层土壤性质预测中的应用。
Sensors (Basel). 2021 Jun 7;21(11):3927. doi: 10.3390/s21113927.
8
Soil organic carbon content estimation with laboratory-based visible-near-infrared reflectance spectroscopy: feature selection.基于实验室可见-近红外反射光谱法的土壤有机碳含量估算:特征选择
Appl Spectrosc. 2014;68(8):831-7. doi: 10.1366/13-07294.
9
Evaluation of Optimized Preprocessing and Modeling Algorithms for Prediction of Soil Properties Using VIS-NIR Spectroscopy.利用可见-近红外光谱法预测土壤性质的优化预处理和建模算法评估
Sensors (Basel). 2021 Oct 11;21(20):6745. doi: 10.3390/s21206745.
10
Soil Organic Carbon Prediction Based on Vis-NIR Spectral Classification Data Using GWPCA-FCM Algorithm.基于GWPCA-FCM算法利用可见-近红外光谱分类数据的土壤有机碳预测
Sensors (Basel). 2024 Jul 30;24(15):4930. doi: 10.3390/s24154930.

引用本文的文献

1
A partitioned conditioned Latin hypercube sampling method considering spatial heterogeneity in digital soil mapping.一种考虑数字土壤制图中空间异质性的分区条件拉丁超立方抽样方法。
Sci Rep. 2025 Apr 14;15(1):12851. doi: 10.1038/s41598-025-95631-5.
2
Assessing the effectiveness of ground truth data to capture landscape variability from an agricultural region using Gaussian simulation and geostatistical techniques.使用高斯模拟和地统计技术评估地面真值数据从农业区域捕捉景观变异性的有效性。
Heliyon. 2021 Jun 29;7(7):e07439. doi: 10.1016/j.heliyon.2021.e07439. eCollection 2021 Jul.

本文引用的文献

1
Combining ancillary soil data with VisNIR spectra to improve predictions of organic and inorganic carbon content of soils.结合辅助土壤数据和可见近红外光谱以改进土壤有机碳和无机碳含量的预测。
MethodsX. 2018 Jun 2;5:551-560. doi: 10.1016/j.mex.2018.05.019. eCollection 2018.