Suppr超能文献

使用可解释人工智能对集成学习预测水质进行解读。

Interpretation of ensemble learning to predict water quality using explainable artificial intelligence.

作者信息

Park Jungsu, Lee Woo Hyoung, Kim Keug Tae, Park Cheol Young, Lee Sanghun, Heo Tae-Young

机构信息

Department of Civil and Environmental Engineering, Hanbat National University,125, Dongseo-daero, Yuseong-gu, Daejeon 34158, Republic of Korea.

Department of Civil, Environmental and Construction Engineering, University of Central Florida, 12800 Pegasus Dr., Orlando, FL 32816, USA.

出版信息

Sci Total Environ. 2022 Aug 1;832:155070. doi: 10.1016/j.scitotenv.2022.155070. Epub 2022 Apr 6.

Abstract

Algal bloom is a significant issue when managing water quality in freshwater; specifically, predicting the concentration of algae is essential to maintaining the safety of the drinking water supply system. The chlorophyll-a (Chl-a) concentration is a commonly used indicator to obtain an estimation of algal concentration. In this study, an XGBoost ensemble machine learning (ML) model was developed from eighteen input variables to predict Chl-a concentration. The composition and pretreatment of input variables to the model are important factors for improving model performance. Explainable artificial intelligence (XAI) is an emerging area of ML modeling that provides a reasonable interpretation of model performance. The effect of input variable selection on model performance was estimated, where the priority of input variable selection was determined using three indices: Shapley value (SHAP), feature importance (FI), and variance inflation factor (VIF). SHAP analysis is an XAI algorithm designed to compute the relative importance of input variables with consistency, providing an interpretable analysis for model prediction. The XGB models simulated with independent variables selected using three indices were evaluated with root mean square error (RMSE), RMSE-observation standard deviation ratio, and Nash-Sutcliffe efficiency. This study shows that the model exhibited the most stable performance when the priority of input variables was determined by SHAP. This implies that on-site monitoring can be designed to collect the selected input variables from the SHAP analysis to reduce the cost of overall water quality analysis. The independent variables were further analyzed using SHAP summary plot, force plot, target plot, and partial dependency plot to provide understandable interpretation on the performance of the XGB model. While XAI is still in the early stages of development, this study successfully demonstrated a good example of XAI application to improve the interpretation of machine learning model performance in predicting water quality.

摘要

在淡水水质管理中,藻华是一个重要问题;具体而言,预测藻类浓度对于维持饮用水供应系统的安全至关重要。叶绿素a(Chl-a)浓度是用于估算藻类浓度的常用指标。在本研究中,基于18个输入变量开发了一个XGBoost集成机器学习(ML)模型来预测Chl-a浓度。模型输入变量的组成和预处理是提高模型性能的重要因素。可解释人工智能(XAI)是ML建模的一个新兴领域,它能对模型性能进行合理的解释。评估了输入变量选择对模型性能的影响,其中使用三个指标确定输入变量选择的优先级:夏普利值(SHAP)、特征重要性(FI)和方差膨胀因子(VIF)。SHAP分析是一种XAI算法,旨在一致地计算输入变量的相对重要性,为模型预测提供可解释的分析。使用这三个指标选择自变量模拟的XGB模型,通过均方根误差(RMSE)、RMSE与观测标准差的比值以及纳什-萨特克利夫效率进行评估。本研究表明,当通过SHAP确定输入变量的优先级时,模型表现出最稳定的性能。这意味着可以设计现场监测来收集SHAP分析中选择的输入变量,以降低整体水质分析的成本。使用SHAP汇总图、强制图、目标图和部分依赖图对自变量进行进一步分析,以对XGB模型的性能提供可理解的解释。虽然XAI仍处于发展初期,但本研究成功展示了一个XAI应用的良好范例,可用于改进机器学习模型在水质预测中性能的解释。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验