Department of Watershed Management and Engineering, Faculty of Natural Resources, Tarbiat Modares University, Tehran, Iran.
Center for Middle Eastern Studies & Department of Water Resources Engineering, Lund University, Lund, Sweden.
Environ Monit Assess. 2019 Nov 28;191(12):777. doi: 10.1007/s10661-019-7979-x.
Arsenic (As) is one of the most important dangerous elements as more than 100 million of people are exposed to risk, globally. The permissible threshold of As for drinking water is 10 μg/L according to both the WHO's drinking water guidelines and the Iranian national standard. However, several studies have indicated that As concentrations exceed this threshold value in several regions of Iran. This research evaluates an As-susceptible region, the Tajan River watershed, using the following data-mining models: multivariate adaptive regression splines (MARS), functional data analysis (FDA), support vector machine (SVM), generalized linear model (GLM), multivariate discriminant analysis (MDA), and gradient boosting machine (GBM). This study considers 12 factors for elevated As concentrations: land use, drainage density, profile curvature, plan curvature, slope length, slope degree, topographic wetness index, erosion, village density, distance from villages, precipitation, and lithology. The susceptibility mapping was conducted using training (70%) and validation (30%). The results of As contamination in sediment showed that classifications into 4 levels of concentration are very similar for two models of GLM and FDA. The GBM calculated the areas of highest arsenic contamination risk by MARS and SVM with percentages of 30.0% and 28.7%, respectively. FDA, GLM, MARS, and MDA models calculated the areas of lowest risk to be 3.3%, 23.0%, 72.0%, 25.2%, and 26.1%, respectively. The results of ROC curve reveal that the MARS, SVM, and MDA had the highest accuracies with area under the curve ROC values of 84.6%, 78.9%, and 79.5%, respectively. Land use, lithology, erosion, and elevation were the most important predictors of contamination potential with a value of 0.6, 0.59, 0.57, and 0.56, respectively. These are the most important factors. Finally, these data-mining methods can be used as appropriate, inexpensive, and feasible options to identify As-susceptible areas and can guide managers to reduce contamination in sediment of the environment and the food chain.
砷(As)是最重要的危险元素之一,全球有超过 1 亿人面临着暴露在其风险之下的威胁。根据世界卫生组织的饮用水指南和伊朗国家标准,饮用水中砷的允许阈值为 10μg/L。然而,一些研究表明,伊朗的几个地区的砷浓度超过了这一阈值。本研究使用以下数据挖掘模型评估了一个砷敏感地区——塔扬河流域:多元自适应回归样条(MARS)、函数数据分析(FDA)、支持向量机(SVM)、广义线性模型(GLM)、多元判别分析(MDA)和梯度提升机(GBM)。本研究考虑了 12 个导致砷浓度升高的因素:土地利用、排水密度、剖面曲率、平面曲率、坡度长度、坡度角度、地形湿度指数、侵蚀、村庄密度、距村庄的距离、降水和岩性。使用训练数据(70%)和验证数据(30%)进行了易感性绘图。沉积物中砷污染的结果表明,GLM 和 FDA 两个模型的 4 级浓度分类非常相似。GBM 分别使用 MARS 和 SVM 计算了砷污染风险最高的区域,比例为 30.0%和 28.7%。FDA、GLM、MARS 和 MDA 模型计算的风险最低区域分别为 3.3%、23.0%、72.0%、25.2%和 26.1%。ROC 曲线的结果表明,MARS、SVM 和 MDA 的准确性最高,曲线下面积 ROC 值分别为 84.6%、78.9%和 79.5%。土地利用、岩性、侵蚀和海拔是污染潜力的最重要预测因素,其值分别为 0.6、0.59、0.57 和 0.56。这些是最重要的因素。最后,这些数据挖掘方法可以作为识别砷敏感区域的合适、经济且可行的选择,并指导管理者减少环境和食物链中沉积物的污染。