Geographic Information Science Research Group, Ton Duc Thang University, Ho Chi Minh City, Viet Nam; Faculty of Environment and Labour Safety, Ton Duc Thang University, Ho Chi Minh City, Viet Nam.
School of Engineering, University of Guelph, ON, Canada.
Sci Total Environ. 2020 May 1;715:136836. doi: 10.1016/j.scitotenv.2020.136836. Epub 2020 Jan 24.
Groundwater resources constitute the main source of clean fresh water for domestic use and it is essential for food production in the agricultural sector. Groundwater has a vital role for water supply in the Campanian Plain in Italy and hence a future sustainability of the resource is essential for the region. In the current paper novel data mining algorithms including Gaussian Process (GP) were used in a large groundwater quality database to predict nitrate (contaminant) and strontium (potential future increasing) concentrations in groundwater. The results were compared with M5P, random forest (RF) and random tree (RT) algorithms as a benchmark to test the robustness of the modeling process. The dataset includes 246 groundwater quality samples originating from different wells, municipals and agricultural. It was divided for the modeling process into two subgroups by using the 10-fold cross validation technique including 173 samples for model building (training dataset) and 73 samples for model validation (testing dataset). Different water quality variables including T, pH, EC, HCO, F, Cl, SO, Na, K, Mg, and Ca have been used as an input to the models. At first stage, different input combinations have been constructed based on correlation coefficient and thus the optimal combination was chosen for the modeling phase. Different quantitative criteria alongside with visual comparison approach have been used for evaluating the modeling capability. Results revealed that to obtain reliable results also variables with low correlation should be considered as an input to the models together with those variables showing high correlation coefficients. According to the model evaluation criteria, GP algorithm outperforms all the other models in predicting both nitrate and strontium concentrations followed by RF, M5P and RT, respectively. Result also revealed that model's structure together with the accuracy and structure of the data can have a relevant impact on the model's results.
地下水是国内清洁淡水的主要来源,也是农业部门粮食生产的基础。地下水对意大利坎帕尼亚平原的供水至关重要,因此该资源的未来可持续性对该地区至关重要。在目前的研究中,使用了包括高斯过程(GP)在内的新型数据挖掘算法,对大型地下水质量数据库进行分析,以预测地下水中的硝酸盐(污染物)和锶(潜在的未来增加)浓度。将结果与 M5P、随机森林(RF)和随机树(RT)算法进行比较,作为基准来测试建模过程的稳健性。该数据集包括来自不同水井、市政和农业的 246 个地下水质量样本。为了进行建模过程,使用 10 折交叉验证技术将数据集分为两个子组,其中 173 个样本用于模型构建(训练数据集),73 个样本用于模型验证(测试数据集)。不同的水质变量,包括 T、pH、EC、HCO、F、Cl、SO、Na、K、Mg 和 Ca,被用作模型的输入。在第一阶段,根据相关系数构建了不同的输入组合,从而选择了最佳组合进行建模阶段。使用了不同的定量标准和可视化比较方法来评估建模能力。结果表明,为了获得可靠的结果,即使是相关系数较低的变量也应该作为输入,与那些显示出高相关系数的变量一起考虑。根据模型评估标准,GP 算法在预测硝酸盐和锶浓度方面的表现优于所有其他模型,其次是 RF、M5P 和 RT。结果还表明,模型的结构以及数据的准确性和结构都可能对模型的结果产生相关影响。