Rahmati Omid, Choubin Bahram, Fathabadi Abolhasan, Coulon Frederic, Soltani Elinaz, Shahabi Himan, Mollaefar Eisa, Tiefenbacher John, Cipullo Sabrina, Ahmad Baharin Bin, Tien Bui Dieu
Geographic Information Science Research Group, Ton Duc Thang University, Ho Chi Minh City, Viet Nam; Faculty of Environment and Labour Safety, Ton Duc Thang University, Ho Chi Minh City, Viet Nam.
Faculty of Natural Resources, University of Tehran, Karaj, Iran.
Sci Total Environ. 2019 Oct 20;688:855-866. doi: 10.1016/j.scitotenv.2019.06.320. Epub 2019 Jun 21.
Although estimating the uncertainty of models used for modelling nitrate contamination of groundwater is essential in groundwater management, it has been generally ignored. This issue motivates this research to explore the predictive uncertainty of machine-learning (ML) models in this field of study using two different residuals uncertainty methods: quantile regression (QR) and uncertainty estimation based on local errors and clustering (UNEEC). Prediction-interval coverage probability (PICP), the most important of the statistical measures of uncertainty, was used to evaluate uncertainty. Additionally, three state-of-the-art ML models including support vector machine (SVM), random forest (RF), and k-nearest neighbor (kNN) were selected to spatially model groundwater nitrate concentrations. The models were calibrated with nitrate concentrations from 80 wells (70% of the data) and then validated with nitrate concentrations from 34 wells (30% of the data). Both uncertainty and predictive performance criteria should be considered when comparing and selecting the best model. Results highlight that the kNN model is the best model because not only did it have the lowest uncertainty based on the PICP statistic in both the QR (0.94) and the UNEEC (in all clusters, 0.85-0.91) methods, but it also had predictive performance statistics (RMSE = 10.63, R = 0.71) that were relatively similar to RF (RMSE = 10.41, R = 0.72) and higher than SVM (RMSE = 13.28, R = 0.58). Determining the uncertainty of ML models used for spatially modelling groundwater-nitrate pollution enables managers to achieve better risk-based decision making and consequently increases the reliability and credibility of groundwater-nitrate predictions.
尽管在地下水管理中,估算用于模拟地下水硝酸盐污染的模型的不确定性至关重要,但这一点通常被忽视。这个问题促使本研究使用两种不同的残差不确定性方法:分位数回归(QR)和基于局部误差与聚类的不确定性估计(UNEEC),来探索机器学习(ML)模型在该研究领域的预测不确定性。预测区间覆盖概率(PICP)作为不确定性统计量中最重要的指标,被用于评估不确定性。此外,还选择了三种先进的ML模型,包括支持向量机(SVM)、随机森林(RF)和k近邻(kNN),对地下水硝酸盐浓度进行空间建模。这些模型使用80口井的硝酸盐浓度数据(占数据的70%)进行校准,然后用34口井的硝酸盐浓度数据(占数据的30%)进行验证。在比较和选择最佳模型时,应同时考虑不确定性和预测性能标准。结果表明,kNN模型是最佳模型,因为它不仅在QR方法(0.94)和UNEEC方法(在所有聚类中,0.85 - 0.91)下基于PICP统计的不确定性最低,而且其预测性能统计量(RMSE = 10.63,R = 0.71)与RF(RMSE = 10.41,R = 0.72)相对相似,且高于SVM(RMSE = 13.28,R = 0.58)。确定用于地下水硝酸盐污染空间建模的ML模型的不确定性,能够使管理者做出更好的基于风险的决策,从而提高地下水硝酸盐预测的可靠性和可信度。