Feigl Moritz, Roesky Benjamin, Herrnegger Mathew, Schulz Karsten, Hayashi Masaki
Department of Water, Atmosphere and Environment, Institute for Hydrology and Water Management University of Natural Resources and Life Sciences, Vienna Vienna Austria.
BGC Engineering Inc Toronto Canada.
Hydrol Process. 2022 Feb;36(2):e14515. doi: 10.1002/hyp.14515. Epub 2022 Feb 24.
Typical applications of process- or physically-based models aim to gain a better process understanding or provide the basis for a decision-making process. To adequately represent the physical system, models should include all essential processes. However, model errors can still occur. Other than large systematic observation errors, simplified, misrepresented, inadequately parametrised or missing processes are potential sources of errors. This study presents a set of methods and a proposed workflow for analysing errors of process-based models as a basis for relating them to process representations. The evaluated approach consists of three steps: (1) training a machine-learning (ML) error model using the input data of the process-based model and other available variables, (2) estimation of local explanations (i.e., contributions of each variable to an individual prediction) for each predicted model error using SHapley Additive exPlanations (SHAP) in combination with principal component analysis, (3) clustering of SHAP values of all predicted errors to derive groups with similar error generation characteristics. By analysing these groups of different error-variable association, hypotheses on error generation and corresponding processes can be formulated. That can ultimately lead to improvements in process understanding and prediction. The approach is applied to a process-based stream water temperature model HFLUX in a case study for modelling an alpine stream in the Canadian Rocky Mountains. By using available meteorological and hydrological variables as inputs, the applied ML model is able to predict model residuals. Clustering of SHAP values results in three distinct error groups that are mainly related to shading and vegetation-emitted long wave radiation. Model errors are rarely random and often contain valuable information. Assessing model error associations is ultimately a way of enhancing trust in implemented processes and of providing information on potential areas of improvement to the model.
基于过程或物理的模型的典型应用旨在更好地理解过程或为决策过程提供基础。为了充分表示物理系统,模型应包括所有基本过程。然而,模型误差仍然可能出现。除了较大的系统性观测误差外,简化、错误表示、参数化不足或缺失的过程是潜在的误差来源。本研究提出了一套方法和建议的工作流程,用于分析基于过程的模型的误差,作为将其与过程表示相关联的基础。评估方法包括三个步骤:(1) 使用基于过程的模型的输入数据和其他可用变量训练机器学习 (ML) 误差模型;(2) 使用Shapley加法解释 (SHAP) 结合主成分分析,估计每个预测模型误差的局部解释(即每个变量对单个预测的贡献);(3) 对所有预测误差的SHAP值进行聚类,以得出具有相似误差产生特征的组。通过分析这些不同误差 - 变量关联的组,可以形成关于误差产生和相应过程的假设。这最终可以导致对过程理解和预测的改进。该方法应用于基于过程的河流水温模型HFLUX,用于模拟加拿大落基山脉的一条高山溪流的案例研究。通过使用可用的气象和水文变量作为输入,应用的ML模型能够预测模型残差。SHAP值的聚类产生了三个不同的误差组,主要与阴影和植被发射的长波辐射有关。模型误差很少是随机的,通常包含有价值的信息。评估模型误差关联最终是增强对已实施过程的信任并为模型提供潜在改进领域信息的一种方式。