College of Environmental & Resource Sciences, Zhejiang University, Hangzhou, Zhejiang, 310058, China.
Department of Civil and Environmental Engineering, Case Western Reserve University, Cleveland, OH, 44106, United States.
Environ Res. 2021 Nov;202:111660. doi: 10.1016/j.envres.2021.111660. Epub 2021 Jul 12.
A systematic understanding of the spatial distribution of water quality is critical for successful watershed management; however, the limited number of physical monitoring stations has restricted the evaluation of spatial water quality distribution and the identification of features impacting the water quality. To fill this gap, we developed a modeling process that employed the random forest regression (RFR) to model the water quality distribution for the Taihu Lake basin in Zhejiang Province, China, and adopted the Shapley Additive exPlanations (SHAP) method to interpret the underlying driving forces. We first used RFR to model three water quality parameters: permanganate index (COD), total phosphorus (TP), and total nitrogen (TN), based on 16 watershed features. We then applied the built models to generate water quality distribution maps for the basin, with the COD ranging from 1.39 to 6.40 mg/L, TP from 0.02 to 0.23 mg/L, and TN from 1.43 to 4.27 mg/L. These maps showed generally consistent patterns among the COD, TN, and TP with minor differences in the spatial distribution. The SHAP analysis showed that the TN was mainly affected by agricultural non-point sources, while the COD and TP were affected by agricultural and domestic sources. Due to differences in sewage collection and treatment between urban and rural areas, the water quality in highly populated urban areas was better than that in rural areas, which led to an unexpected positive relationship between water quality and population density. Overall, with the RFR models and SHAP interpretation, we obtained a continuous distribution pattern of the water quality and identified its driving forces in the basin. These findings provided important information to assist water quality restoration projects.
系统地了解水质的空间分布情况对于成功的流域管理至关重要;然而,物理监测站的数量有限,限制了对空间水质分布的评估和对影响水质特征的识别。为了弥补这一差距,我们开发了一种建模过程,该过程使用随机森林回归(RFR)来模拟中国浙江省太湖流域的水质分布,并采用 Shapley Additive exPlanations(SHAP)方法来解释潜在的驱动因素。我们首先使用 RFR 根据 16 个流域特征来对三个水质参数(高锰酸盐指数(COD)、总磷(TP)和总氮(TN))进行建模。然后,我们将构建的模型应用于生成流域的水质分布图,其中 COD 的范围为 1.39 至 6.40mg/L,TP 的范围为 0.02 至 0.23mg/L,TN 的范围为 1.43 至 4.27mg/L。这些地图显示了 COD、TN 和 TP 之间的总体一致模式,空间分布上的差异较小。SHAP 分析表明,TN 主要受农业非点源的影响,而 COD 和 TP 则受农业和生活源的影响。由于城乡之间在污水收集和处理方面的差异,人口稠密的城市地区的水质优于农村地区,这导致了水质与人口密度之间出乎意料的正相关关系。总体而言,通过 RFR 模型和 SHAP 解释,我们获得了流域内水质的连续分布模式,并确定了其驱动力。这些发现为水质恢复项目提供了重要信息。