使用 Shapley 值解释机器学习模型：在化合物效力和多靶点活性预测中的应用。

Interpretation of machine learning models using shapley values: application to compound potency and multi-target activity predictions.

机构信息

Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Endenicher Allee 19c, 53115, Bonn, Germany.

出版信息

J Comput Aided Mol Des. 2020 Oct;34(10):1013-1026. doi: 10.1007/s10822-020-00314-0. Epub 2020 May 2.

DOI:10.1007/s10822-020-00314-0

PMID:32361862

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7449951/

Abstract

Difficulties in interpreting machine learning (ML) models and their predictions limit the practical applicability of and confidence in ML in pharmaceutical research. There is a need for agnostic approaches aiding in the interpretation of ML models regardless of their complexity that is also applicable to deep neural network (DNN) architectures and model ensembles. To these ends, the SHapley Additive exPlanations (SHAP) methodology has recently been introduced. The SHAP approach enables the identification and prioritization of features that determine compound classification and activity prediction using any ML model. Herein, we further extend the evaluation of the SHAP methodology by investigating a variant for exact calculation of Shapley values for decision tree methods and systematically compare this variant in compound activity and potency value predictions with the model-independent SHAP method. Moreover, new applications of the SHAP analysis approach are presented including interpretation of DNN models for the generation of multi-target activity profiles and ensemble regression models for potency prediction.

摘要

机器学习 (ML) 模型及其预测的解释困难限制了 ML 在药物研究中的实际应用和置信度。需要一种与 ML 模型的复杂性无关的、有助于解释的方法，这种方法还应适用于深度神经网络 (DNN) 架构和模型集成。为此，最近引入了 SHapley Additive exPlanations (SHAP) 方法。SHAP 方法可用于确定和优先考虑使用任何 ML 模型确定化合物分类和活性预测的特征。在此，我们通过研究一种用于精确计算决策树方法的 Shapley 值的变体进一步扩展了 SHAP 方法的评估，并系统地将该变体与独立于模型的 SHAP 方法在化合物活性和效价预测方面进行比较。此外，还提出了 SHAP 分析方法的新应用，包括解释用于生成多靶标活性谱的 DNN 模型和用于效价预测的集成回归模型。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/525b/7449951/b604d3152094/10822_2020_314_Fig1_HTML.jpg

相似文献

Interpretation of machine learning models using shapley values: application to compound potency and multi-target activity predictions.

J Comput Aided Mol Des. 2020 Oct;34(10):1013-1026. doi: 10.1007/s10822-020-00314-0. Epub 2020 May 2.

Interpretation of Compound Activity Predictions from Complex Machine Learning Models Using Local Approximations and Shapley Values.

J Med Chem. 2020 Aug 27;63(16):8761-8777. doi: 10.1021/acs.jmedchem.9b01101. Epub 2019 Sep 26.

Evaluation of multi-target deep neural network models for compound potency prediction under increasingly challenging test conditions.

J Comput Aided Mol Des. 2021 Mar;35(3):285-295. doi: 10.1007/s10822-021-00376-8. Epub 2021 Feb 17.

Classification and Explanation for Intrusion Detection System Based on Ensemble Trees and SHAP Method.

Sensors (Basel). 2022 Feb 3;22(3):1154. doi: 10.3390/s22031154.

Explaining multivariate molecular diagnostic tests via Shapley values.

BMC Med Inform Decis Mak. 2021 Jul 8;21(1):211. doi: 10.1186/s12911-021-01569-9.

Calculation of exact Shapley values for support vector machines with Tanimoto kernel enables model interpretation.

iScience. 2022 Aug 27;25(9):105023. doi: 10.1016/j.isci.2022.105023. eCollection 2022 Sep 16.

Calculation of exact Shapley values for explaining support vector machine models using the radial basis function kernel.

Sci Rep. 2023 Nov 10;13(1):19561. doi: 10.1038/s41598-023-46930-2.

Explaining Multiclass Compound Activity Predictions Using Counterfactuals and Shapley Values.

Molecules. 2023 Jul 24;28(14):5601. doi: 10.3390/molecules28145601.

Explanation of machine learning models using shapley additive explanation and application for real data in hospital.

Comput Methods Programs Biomed. 2022 Feb;214:106584. doi: 10.1016/j.cmpb.2021.106584. Epub 2021 Dec 10.

An explainable predictive model for suicide attempt risk using an ensemble learning and Shapley Additive Explanations (SHAP) approach.

Asian J Psychiatr. 2023 Jan;79:103316. doi: 10.1016/j.ajp.2022.103316. Epub 2022 Nov 7.

引用本文的文献

Spatial and Temporal Inconsistency of Forest Resilience and Forest Vegetation Greening in Southwest China Under Climate Change.

Plants (Basel). 2025 Aug 11;14(16):2493. doi: 10.3390/plants14162493.

ACLPred: an explainable machine learning and tree-based ensemble model for anticancer ligand prediction.

Sci Rep. 2025 Aug 25;15(1):31268. doi: 10.1038/s41598-025-16575-4.

Cost-Efficient Early Diagnostic Tool for Lung Cancer: Explainable AI in Clinical Systems.

Technol Cancer Res Treat. 2025 Jan-Dec;24:15330338251370239. doi: 10.1177/15330338251370239. Epub 2025 Aug 14.

Investigation of Growth Differentiation Factor 15 as a Prognostic Biomarker for Major Adverse Limb Events in Peripheral Artery Disease.

J Clin Med. 2025 Jul 24;14(15):5239. doi: 10.3390/jcm14155239.

Growth Differentiation Factor 15 Predicts Cardiovascular Events in Peripheral Artery Disease.

Biomolecules. 2025 Jul 11;15(7):991. doi: 10.3390/biom15070991.

Unveiling key pathomic features for automated diagnosis and Gleason grade estimation in prostate cancer.

BMC Med Imaging. 2025 Jul 28;25(1):299. doi: 10.1186/s12880-025-01841-8.

"intelligent Read Across (iRA)"- A tool for read-across-based toxicity prediction of nanoparticles.

Comput Struct Biotechnol J. 2025 Jul 17;29:186-200. doi: 10.1016/j.csbj.2025.07.032. eCollection 2025.

Recent advances in AI-based toxicity prediction for drug discovery.

Front Chem. 2025 Jul 8;13:1632046. doi: 10.3389/fchem.2025.1632046. eCollection 2025.

Prediction of birthweight with early and mid-pregnancy antenatal markers utilising machine learning and explainable artificial intelligence.

Sci Rep. 2025 Jul 19;15(1):26223. doi: 10.1038/s41598-025-11837-7.

The role of artificial intelligence in maternal and child health: Progress, controversies, and future directions.

PLOS Digit Health. 2025 Jul 17;4(7):e0000938. doi: 10.1371/journal.pdig.0000938. eCollection 2025 Jul.

本文引用的文献

From Local Explanations to Global Understanding with Explainable AI for Trees.

Nat Mach Intell. 2020 Jan;2(1):56-67. doi: 10.1038/s42256-019-0138-9. Epub 2020 Jan 17.

Interpretation of Compound Activity Predictions from Complex Machine Learning Models Using Local Approximations and Shapley Values.

J Med Chem. 2020 Aug 27;63(16):8761-8777. doi: 10.1021/acs.jmedchem.9b01101. Epub 2019 Sep 26.

Support Vector Machine Classification and Regression Prioritize Different Structural Features for Binary Compound Activity and Potency Value Prediction.

ACS Omega. 2017 Oct 31;2(10):6371-6379. doi: 10.1021/acsomega.7b01079. Epub 2017 Oct 4.

Machine learning in chemoinformatics and drug discovery.

Drug Discov Today. 2018 Aug;23(8):1538-1546. doi: 10.1016/j.drudis.2018.05.010. Epub 2018 May 8.

Interpretation of Quantitative Structure-Activity Relationship Models: Past, Present, and Future.

J Chem Inf Model. 2017 Nov 27;57(11):2618-2639. doi: 10.1021/acs.jcim.7b00274. Epub 2017 Oct 13.

Assessing Scaffold Diversity of Kinase Inhibitors Using Alternative Scaffold Concepts and Estimating the Scaffold Hopping Potential for Different Kinases.

Molecules. 2017 May 3;22(5):730. doi: 10.3390/molecules22050730.

Computational Method for the Systematic Identification of Analog Series and Key Compounds Representing Series and Their Biological Activity Profiles.

J Med Chem. 2016 Aug 25;59(16):7667-76. doi: 10.1021/acs.jmedchem.6b00906. Epub 2016 Aug 8.

Visual Interpretation of Kernel-Based Prediction Models.

Mol Inform. 2011 Sep;30(9):817-26. doi: 10.1002/minf.201100059. Epub 2011 Sep 5.

ZINC 15--Ligand Discovery for Everyone.

J Chem Inf Model. 2015 Nov 23;55(11):2324-37. doi: 10.1021/acs.jcim.5b00559. Epub 2015 Nov 9.

Visualization and Interpretation of Support Vector Machine Activity Predictions.

J Chem Inf Model. 2015 Jun 22;55(6):1136-47. doi: 10.1021/acs.jcim.5b00175. Epub 2015 Jun 2.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

使用 Shapley 值解释机器学习模型：在化合物效力和多靶点活性预测中的应用。

Interpretation of machine learning models using shapley values: application to compound potency and multi-target activity predictions.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献