Department of Bioresource Engineering, McGill University, 21111 Lakeshore, Ste Anne de Bellevue, Quebec, H9X3V9, Canada.
Faculty of Civil Engineering, University of Tabriz, 29 Bahman Blvd., Tabriz, 5166616471, Iran.
Ground Water. 2020 Sep;58(5):723-734. doi: 10.1111/gwat.12963. Epub 2019 Dec 18.
While it remains the primary source of safe drinking and irrigation water in northwest Iran's Maku Plain, the region's groundwater is prone to fluoride contamination. Accordingly, modeling techniques to accurately predict groundwater fluoride concentration are required. The current paper advances several novel data mining algorithms including Lazy learners [instance-based K-nearest neighbors (IBK); locally weighted learning (LWL); and KStar], a tree-based algorithm (M5P), and a meta classifier algorithm [regression by discretization (RBD)] to predict groundwater fluoride concentration. Drawing on several groundwater quality variables (e.g., concentrations), measured in each of 143 samples collected between 2004 and 2008, several models predicting groundwater fluoride concentrations were developed. The full dataset was divided into two subsets: 70% for model training (calibration) and 30% for model evaluation (validation). Models were validated using several statistical evaluation criteria and three visual evaluation approaches (i.e., scatter plots, Taylor and Violin diagrams). Although Na and Ca showed the greatest positive and negative correlations with fluoride (r = 0.59 and -0.39, respectively), they were insufficient to reliably predict fluoride levels; therefore, other water quality variables, including those weakly correlated with fluoride, should be considered as inputs for fluoride prediction. The IBK model outperformed other models in fluoride contamination prediction, followed by KStar, RBD, M5P, and LWL. The RBD and M5P models were the least accurate in terms of predicting peaks in fluoride concentration values. Results of the current study can be used to support practical and sustainable management of water and groundwater resources.
虽然它仍然是伊朗西北部马库平原安全饮用水和灌溉水的主要来源,但该地区的地下水容易受到氟污染。因此,需要使用建模技术来准确预测地下水中的氟浓度。本文提出了几种新的数据挖掘算法,包括懒惰学习者[基于实例的 K-最近邻 (IBK);局部加权学习 (LWL);和 KStar]、基于树的算法 (M5P) 和元分类器算法[离散化回归 (RBD)]来预测地下水中的氟浓度。利用在 2004 年至 2008 年间采集的 143 个样本中的每个样本测量的几个地下水质量变量(例如浓度),开发了几种预测地下水中氟浓度的模型。将完整数据集分为两个子集:70%用于模型训练(校准)和 30%用于模型评估(验证)。使用几个统计评估标准和三种可视化评估方法(即散点图、泰勒和小提琴图)对模型进行验证。虽然 Na 和 Ca 与氟化物呈最大正相关和负相关(r = 0.59 和 -0.39),但它们不足以可靠地预测氟化物水平;因此,应将包括与氟化物弱相关的其他水质变量考虑作为预测氟化物的输入。IBK 模型在氟污染预测方面优于其他模型,其次是 KStar、RBD、M5P 和 LWL。RBD 和 M5P 模型在预测氟浓度值峰值方面的准确性最低。本研究的结果可用于支持水资源和地下水的实际和可持续管理。