Huang Pu, Huang Qing, Wang Jingtian, Shi Yuhan
National Key Laboratory of Efficient Utilization of Arid and Semi-arid Arable Land in Northern China, Beijing, 100081, China.
Institute of Agricultural Resources and Regional Planning, Chinese Academy of Agricultural Sciences, Beijing, 100081, China.
Environ Monit Assess. 2025 Mar 8;197(4):367. doi: 10.1007/s10661-025-13814-z.
Comprehensive and accurate acquisition of surface soil pH spatial distribution information is essential for monitoring soil degradation and providing scientific guidance for agricultural practices. This study focused on Heilongjiang Province in China, utilizing data from 125 soil survey sampling points. Key environmental covariates were identified as modeling inputs through Pearson correlation analysis and recursive feature elimination (RFE). Three machine learning models-support vector machine (SVM), random forest (RF), and extreme gradient boosting (XGBoost)-were employed to predict surface soil pH in the study area. The modeling outcomes and distinctions among these models were then thoroughly compared. The results showed that the mean monthly temperature maximum (MMTmax), mean monthly precipitation minimum (MMPmin), mean annual precipitation (MAP), drought index (DI), and mean monthly wind speed maximum (MMWSmax) were the most important environmental covariates for modeling. Climate variables are better suited to reflect the nonlinear relationships between soil properties and the environment in large and flat areas during mapping. Among the mapping models, XGBoost exhibited the highest prediction performance (R =0.705, RMSE=0.633, MAE=0.484), followed by RF (R =0.688, RMSE=0.656, MAE=0.497), while SVM was considered unstable in this study. For uncertainty maps, XGBoost demonstrated lower uncertainty primarily in high-altitude mountainous forest regions, whereas RF achieved higher prediction consistency mainly in low-altitude plain areas. Each prediction model had its advantages in different terrain regions, yet XGBoost was regarded as the optimal model. According to the optimal model, the typical black soil in Heilongjiang Province generally exhibited weak acidity, with an average pH of 6.42, showing a gradual increasing trend from east to west and from north to south. Soil acidification mainly occurred in the meadow black soil and albic black soil regions of Heilongjiang Province's eastern and northeastern parts. It is imperative to rigorously control the application of nitrogen fertilizers and to focus on improving the soil's acid-base buffering capacity.
全面、准确地获取表层土壤pH值空间分布信息对于监测土壤退化以及为农业实践提供科学指导至关重要。本研究聚焦于中国黑龙江省,利用了125个土壤调查采样点的数据。通过Pearson相关分析和递归特征消除(RFE)确定关键环境协变量作为建模输入。采用三种机器学习模型——支持向量机(SVM)、随机森林(RF)和极端梯度提升(XGBoost)——来预测研究区域的表层土壤pH值。然后对这些模型的建模结果和差异进行了全面比较。结果表明,月最高平均气温(MMTmax)、月最低平均降水量(MMPmin)、年平均降水量(MAP)、干旱指数(DI)和月最大平均风速(MMWSmax)是建模中最重要的环境协变量。在制图过程中,气候变量更适合反映大面积平坦区域土壤性质与环境之间的非线性关系。在制图模型中,XGBoost表现出最高的预测性能(R =0.705,RMSE=0.633,MAE=0.484),其次是RF(R =0.688,RMSE=0.656,MAE=0.497),而SVM在本研究中被认为不稳定。对于不确定性地图,XGBoost主要在高海拔山区森林区域表现出较低的不确定性,而RF主要在低海拔平原地区实现了较高的预测一致性。每个预测模型在不同地形区域都有其优势,但XGBoost被视为最优模型。根据最优模型,黑龙江省典型黑土总体呈弱酸性,平均pH值为6.42,呈现出自东向西、自北向南逐渐升高的趋势。土壤酸化主要发生在黑龙江省东部和东北部的草甸黑土和白浆化黑土区域。必须严格控制氮肥施用,并着重提高土壤的酸碱缓冲能力。