Suppr超能文献

使用积分梯度解释高斯过程模型。

Interpret Gaussian Process Models by Using Integrated Gradients.

作者信息

Zhang Fan, Ono Naoaki, Kanaya Shigehiko

机构信息

Division of Information Science, Graduate School of Science and Technology, Nara Institute of Science and Technology, 8916-5 Takayama, Ikoma, Nara, 630-0192, Japan.

Data Science Center, Graduate School of Science and Technology, Nara Institute of Science and Technology, 8916-5 Takayama, Ikoma, Nara, 630-0192, Japan.

出版信息

Mol Inform. 2025 Jan;44(1):e202400051. doi: 10.1002/minf.202400051. Epub 2024 Nov 26.

Abstract

Gaussian process regression (GPR) is a nonparametric probabilistic model capable of computing not only the predicted mean but also the predicted standard deviation, which represents the confidence level of predictions. It offers great flexibility as it can be non-linearized by designing the kernel function, made robust against outliers by altering the likelihood function, and extended to classification models. Recently, models combining deep learning with GPR, such as Deep Kernel Learning GPR, have been proposed and reported to achieve higher accuracy than GPR. However, due to its nonparametric nature, GPR is challenging to interpret. While Explainable AI (XAI) methods like LIME or kernel SHAP can interpret the predicted mean, interpreting the predicted standard deviation remains difficult. In this study, we propose a novel method to interpret the prediction of GPR by evaluating the importance of explanatory variables. We have incorporated the GPR model with the Integrated Gradients (IG) method to assess the contribution of each feature to the prediction. By evaluating the standard deviation of the posterior distribution, we show that the IG approach provides a detailed decomposition of the predictive uncertainty, attributing it to the uncertainty in individual feature contributions. This methodology not only highlights the variables that are most influential in the prediction but also provides insights into the reliability of the model by quantifying the uncertainty associated with each feature. Through this, we can obtain a deeper understanding of the model's behavior and foster trust in its predictions, especially in domains where interpretability is as crucial as accuracy.

摘要

高斯过程回归(GPR)是一种非参数概率模型,它不仅能够计算预测均值,还能计算预测标准差,该标准差代表了预测的置信水平。它具有很大的灵活性,因为可以通过设计核函数使其非线性化,通过改变似然函数使其对异常值具有鲁棒性,并扩展到分类模型。最近,已经提出了将深度学习与GPR相结合的模型,如深度核学习GPR,并据报道其比GPR具有更高的准确性。然而,由于其非参数性质,GPR难以解释。虽然像LIME或核SHAP这样的可解释人工智能(XAI)方法可以解释预测均值,但解释预测标准差仍然很困难。在本研究中,我们提出了一种通过评估解释变量的重要性来解释GPR预测的新方法。我们将GPR模型与积分梯度(IG)方法相结合,以评估每个特征对预测的贡献。通过评估后验分布的标准差,我们表明IG方法提供了预测不确定性的详细分解,将其归因于各个特征贡献中的不确定性。这种方法不仅突出了在预测中最具影响力的变量,还通过量化与每个特征相关的不确定性,提供了对模型可靠性的见解。通过这种方式,我们可以更深入地理解模型的行为,并增强对其预测的信任,特别是在可解释性与准确性同样重要的领域。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f9c9/11695984/6c43d5ffac96/MINF-44-e202400051-g006.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验