State Key Laboratory of Water Resources and Hydropower Engineering Science, Wuhan University, Wuhan, 430072, China.
China National Environmental Monitoring Center, Beijing, 100012, China.
J Environ Manage. 2023 Sep 15;342:118077. doi: 10.1016/j.jenvman.2023.118077. Epub 2023 May 18.
One critical question for water security and sustainable development is how water quality responses to the changes in natural factors and human activities, especially in light of the expected exacerbation in water scarcity. Although machine learning models have shown noticeable advances in water quality attribution analysis, they have limited interpretability in explaining the feature importance with theoretical guarantees of consistency. To fill this gap, this study built a modelling framework that employed the inverse distance weighting method and the extreme gradient boosting model to simulate the water quality at grid scale, and adapted the Shapley additive explanation to interpret the contributions of the drivers to water quality over the Yangtze River basin. Different from previous studies, we calculated the contribution of features to water quality at each grid within river basin and aggregated the contribution from all the grids as the feature importance. Our analysis revealed dramatic changes in response magnitudes of water quality to drivers within river basin. Air temperature had high importance in the variability of key water quality indicators (i.e. ammonia-nitrogen, total phosphorus, and chemical oxygen demand), and dominated the changes of water quality in Yangtze River basin, especially in the upstream region. In the mid- and downstream regions, water quality was mainly affected by human activities. This study provided a modelling framework applicable to robustly identify the feature importance by explaining the contribution of features to water quality at each grid.
水安全和可持续发展的一个关键问题是水质如何应对自然因素和人类活动的变化,特别是在预期的水资源短缺加剧的情况下。尽管机器学习模型在水质归因分析方面显示出了显著的进展,但它们在解释特征重要性方面的可解释性有限,并且缺乏一致性的理论保证。为了填补这一空白,本研究构建了一个建模框架,该框架采用距离反比加权法和极端梯度提升模型来模拟网格尺度的水质,并采用 Shapley 加法解释来解释长江流域水质对驱动因素的贡献。与以往的研究不同,我们计算了流域内每个网格的特征对水质的贡献,并将所有网格的贡献汇总为特征重要性。我们的分析揭示了流域内水质对驱动因素的响应幅度发生了剧烈变化。气温对关键水质指标(即氨氮、总磷和化学需氧量)的变化具有重要意义,并主导着长江流域水质的变化,特别是在上游地区。在中下游地区,水质主要受到人类活动的影响。本研究提供了一个建模框架,通过解释特征对每个网格水质的贡献,可稳健地识别特征重要性。