Geographic Information Science Research Group, Ton Duc Thang University, Ho Chi Minh City, Viet Nam; Faculty of Environment and Labour Safety, Ton Duc Thang University, Ho Chi Minh City, Viet Nam.
School of Engineering, University of Guelph, Guelph, Canada.
Sci Total Environ. 2020 Jun 15;721:137612. doi: 10.1016/j.scitotenv.2020.137612. Epub 2020 Mar 3.
River water quality assessment is one of the most important tasks to enhance water resources management plans. A water quality index (WQI) considers several water quality variables simultaneously. Traditionally WQI calculations consume time and are often fraught with errors during derivations of sub-indices. In this study, 4 standalone (random forest (RF), M5P, random tree (RT), and reduced error pruning tree (REPT)) and 12 hybrid data-mining algorithms (combinations of standalones with bagging (BA), CV parameter selection (CVPS) and randomizable filtered classification (RFC)) were used to create Iran WQI (IRWQI) predictions. Six years (2012 to 2018) of monthly data from two water quality monitoring stations within the Talar catchment were compiled. Using Pearson correlation coefficients, 10 different input combinations were constructed. The data were divided into two groups (ratio 70:30) for model building (training dataset) and model validation (testing dataset) using a 10-fold cross-validation technique. The models were evaluated using several statistical and visual evaluation metrics. Result show that fecal coliform (FC) and total solids (TS) had the greatest and least effect on the prediction of IRWQI. The best input combinations varied among the algorithms; generally variables with very low correlations displayed weaker performance. Hybrid algorithms improved the prediction power of several of the standalone models, but not all. Hybrid BA-RT outperformed the other models (R = 0.941, RMSE = 2.71, MAE = 1.87, NSE = 0.941, PBIAS = 0.500). PBIAS indicated that all algorithms, with the exceptions of RT, BA-RT and CVPS-REPT, overestimated WQI values.
河流水质评估是增强水资源管理计划的最重要任务之一。水质指数 (WQI) 同时考虑多个水质变量。传统上,WQI 计算既耗时又容易在子指数推导过程中出错。在这项研究中,使用了 4 个独立的(随机森林 (RF)、M5P、随机树 (RT) 和简化错误修剪树 (REPT))和 12 个混合数据挖掘算法(独立算法与袋装 (BA)、交叉验证参数选择 (CVPS) 和可随机化过滤分类 (RFC) 的组合)来创建伊朗水质指数 (IRWQI) 预测。编译了塔尔勒流域内两个水质监测站的六年(2012 年至 2018 年)的每月数据。使用皮尔逊相关系数,构建了 10 种不同的输入组合。使用 10 折交叉验证技术将数据分为两组(比例为 70:30),用于模型构建(训练数据集)和模型验证(测试数据集)。使用多种统计和可视化评估指标对模型进行评估。结果表明,粪大肠菌群 (FC) 和总固体 (TS) 对 IRWQI 的预测影响最大和最小。最佳输入组合因算法而异;通常,相关性非常低的变量表现较弱。混合算法提高了几个独立模型的预测能力,但并非全部。混合 BA-RT 优于其他模型(R=0.941、RMSE=2.71、MAE=1.87、NSE=0.941、PBIAS=0.500)。PBIAS 表明,除了 RT、BA-RT 和 CVPS-REPT 之外,所有算法都高估了 WQI 值。