Suppr超能文献

比较使用不同方法计算Shapley值生成的分子机器学习模型的解释

Comparing Explanations of Molecular Machine Learning Models Generated with Different Methods for the Calculation of Shapley Values.

作者信息

Lamens Alec, Bajorath Jürgen

机构信息

Department of Life Science Informatics and Data Science, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Friedrich-Hirzebruch-Allee 5/6, D-53115, Bonn, Germany.

Lamarr Institute for Machine Learning and Artificial Intelligence, Rheinische Friedrich-Wilhelms-Universität Bonn, Friedrich-Hirzebruch-Allee 5/6, D-53115, Bonn, Germany.

出版信息

Mol Inform. 2025 Mar;44(3):e202500067. doi: 10.1002/minf.202500067.

Abstract

Feature attribution methods from explainable artificial intelligence (XAI) provide explanations of machine learning models by quantifying feature importance for predictions of test instances. While features determining individual predictions have frequently been identified in machine learning applications, the consistency of feature importance-based explanations of machine learning models using different attribution methods has not been thoroughly investigated. We have systematically compared model explanations in molecular machine learning. Therefore, a test system of highly accurate compound activity predictions for different targets using different machine learning methods was generated. For these predictions, explanations were computed using methodological variants of the Shapley value formalism, a popular feature attribution approach in machine learning adapted from game theory. Predictions of each model were assessed using a model-agnostic and model-specific Shapley value-based method. The resulting feature importance distributions were characterized and compared by a global statistical analysis using diverse measures. Unexpectedly, methodological variants for Shapley value calculations yielded distinct feature importance distributions for highly accurate predictions. There was only little agreement between alternative model explanations. Our findings suggest that feature importance-based explanations of machine learning predictions should include an assessment of consistency using alternative methods.

摘要

可解释人工智能(XAI)中的特征归因方法通过量化特征对测试实例预测的重要性来解释机器学习模型。虽然在机器学习应用中经常能识别出决定单个预测的特征,但使用不同归因方法对机器学习模型基于特征重要性的解释的一致性尚未得到充分研究。我们系统地比较了分子机器学习中的模型解释。因此,构建了一个使用不同机器学习方法对不同目标进行高精度化合物活性预测的测试系统。对于这些预测,使用Shapley值形式主义的方法变体来计算解释,Shapley值形式主义是一种从博弈论改编而来的在机器学习中流行的特征归因方法。使用基于Shapley值的模型无关和模型特定方法评估每个模型的预测。通过使用多种度量的全局统计分析来表征和比较所得的特征重要性分布。出乎意料的是,Shapley值计算的方法变体对于高精度预测产生了不同的特征重要性分布。不同模型解释之间的一致性很小。我们的研究结果表明,基于特征重要性的机器学习预测解释应包括使用替代方法对一致性的评估。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6deb/11925390/8fea5d794388/MINF-44-e202500067-g009.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验