基于 Shapley 值分析揭示支持向量机和随机森林模型在化合物分类中学习特征的差异。

Differences in learning characteristics between support vector machine and random forest models for compound classification revealed by Shapley value analysis.

机构信息

B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Department of Life Science Informatics and Data Science, Rheinische Friedrich-Wilhelms-Universität, Friedrich-Hirzebruch-Allee 5/6, 53115, Bonn, Germany.

出版信息

Sci Rep. 2023 Apr 12;13(1):5983. doi: 10.1038/s41598-023-33215-x.

DOI:10.1038/s41598-023-33215-x

PMID:37045972

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10097675/

Abstract

The random forest (RF) and support vector machine (SVM) methods are mainstays in molecular machine learning (ML) and compound property prediction. We have explored in detail how binary classification models derived using these algorithms arrive at their predictions. To these ends, approaches from explainable artificial intelligence (XAI) are applicable such as the Shapley value concept originating from game theory that we adapted and further extended for our analysis. In large-scale activity-based compound classification using models derived from training sets of increasing size, RF and SVM with the Tanimoto kernel produced very similar predictions that could hardly be distinguished. However, Shapley value analysis revealed that their learning characteristics systematically differed and that chemically intuitive explanations of accurate RF and SVM predictions had different origins.

摘要

随机森林（RF）和支持向量机（SVM）方法是分子机器学习（ML）和化合物性质预测的主要方法。我们详细探讨了使用这些算法得出的二进制分类模型如何进行预测。为此，可应用可解释人工智能（XAI）方法，例如源自博弈论的 Shapley 值概念，我们对其进行了改编和进一步扩展，以用于我们的分析。在使用源自不断增大的训练集的模型进行基于活性的大规模化合物分类中，RF 和 SVM 与 Tanimoto 核产生的预测非常相似，几乎无法区分。然而，Shapley 值分析表明，它们的学习特征系统不同，并且准确的 RF 和 SVM 预测的化学直观解释具有不同的起源。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4125/10097675/de767f18ab37/41598_2023_33215_Fig1_HTML.jpg

相似文献

Differences in learning characteristics between support vector machine and random forest models for compound classification revealed by Shapley value analysis.基于 Shapley 值分析揭示支持向量机和随机森林模型在化合物分类中学习特征的差异。

Sci Rep. 2023 Apr 12;13(1):5983. doi: 10.1038/s41598-023-33215-x.

Calculation of exact Shapley values for explaining support vector machine models using the radial basis function kernel.使用径向基函数核来解释支持向量机模型的精确夏普利值的计算。

Sci Rep. 2023 Nov 10;13(1):19561. doi: 10.1038/s41598-023-46930-2.

Calculation of exact Shapley values for support vector machines with Tanimoto kernel enables model interpretation.使用Tanimoto核的支持向量机精确Shapley值的计算能够实现模型解释。

iScience. 2022 Aug 27;25(9):105023. doi: 10.1016/j.isci.2022.105023. eCollection 2022 Sep 16.

Explaining Multiclass Compound Activity Predictions Using Counterfactuals and Shapley Values.使用反事实和沙普利值解释多类化合物活性预测

Molecules. 2023 Jul 24;28(14):5601. doi: 10.3390/molecules28145601.

Explainable artificial intelligence model for identifying COVID-19 gene biomarkers.用于识别 COVID-19 基因生物标志物的可解释人工智能模型。

Comput Biol Med. 2023 Mar;154:106619. doi: 10.1016/j.compbiomed.2023.106619. Epub 2023 Feb 1.

Machine Learning Models for Predicting Influential Factors of Early Outcomes in Acute Ischemic Stroke: Registry-Based Study.用于预测急性缺血性卒中早期预后影响因素的机器学习模型：基于登记处的研究

JMIR Med Inform. 2022 Mar 25;10(3):e32508. doi: 10.2196/32508.

An Explainable Artificial Intelligence Framework for the Deterioration Risk Prediction of Hepatitis Patients.用于预测肝炎患者恶化风险的可解释人工智能框架。

J Med Syst. 2021 Apr 13;45(5):61. doi: 10.1007/s10916-021-01736-5.

Real-world data to build explainable trustworthy artificial intelligence models for prediction of immunotherapy efficacy in NSCLC patients.用于构建可解释的可信人工智能模型以预测非小细胞肺癌患者免疫治疗疗效的真实世界数据。

Front Oncol. 2023 Jan 23;12:1078822. doi: 10.3389/fonc.2022.1078822. eCollection 2022.

Do we need different machine learning algorithms for QSAR modeling? A comprehensive assessment of 16 machine learning algorithms on 14 QSAR data sets.我们是否需要不同的机器学习算法来进行定量构效关系建模？对 16 种机器学习算法在 14 个定量构效关系数据集上的综合评估。

Brief Bioinform. 2021 Jul 20;22(4). doi: 10.1093/bib/bbaa321.

Protocol to explain support vector machine predictions via exact Shapley value computation.通过精确 Shapley 值计算解释支持向量机预测的协议。

STAR Protoc. 2024 Jun 21;5(2):103010. doi: 10.1016/j.xpro.2024.103010. Epub 2024 Apr 11.

引用本文的文献

Explainable machine learning for predicting distant metastases in renal cell carcinoma patients: a population-based retrospective study.用于预测肾细胞癌患者远处转移的可解释机器学习：一项基于人群的回顾性研究

Front Med (Lausanne). 2025 Jul 29;12:1624198. doi: 10.3389/fmed.2025.1624198. eCollection 2025.

Deep Learning and The Retina: A New Frontier in Multiple Sclerosis Diagnosis.深度学习与视网膜：多发性硬化症诊断的新前沿

Curr Health Sci J. 2025 Jan-Mar;51(1):26-36. doi: 10.12865/CHSJ.51.01.03. Epub 2025 Mar 31.

Machine Learning Approach and Bioinformatics Analysis Discovered Key Genomic Signatures for Hepatitis B Virus-Associated Hepatocyte Remodeling and Hepatocellular Carcinoma.机器学习方法与生物信息学分析发现了乙型肝炎病毒相关肝细胞重塑和肝细胞癌的关键基因组特征。

Cancer Inform. 2025 Apr 16;24:11769351251333847. doi: 10.1177/11769351251333847. eCollection 2025.

GCN-Based Framework for Materials Screening and Phase Identification.基于图卷积网络的材料筛选与相识别框架

Materials (Basel). 2025 Feb 21;18(5):959. doi: 10.3390/ma18050959.

Predicting antipsychotic responsiveness using a machine learning classifier trained on plasma levels of inflammatory markers in schizophrenia.使用基于精神分裂症炎症标志物血浆水平训练的机器学习分类器预测抗精神病药物反应性。

Transl Psychiatry. 2025 Feb 14;15(1):51. doi: 10.1038/s41398-025-03264-z.

Improving the explainability of autoencoder factors for commodities through forecast-based Shapley values.通过基于预测的夏普利值提高商品自动编码器因子的可解释性。

Sci Rep. 2024 Aug 23;14(1):19622. doi: 10.1038/s41598-024-70342-5.

A Machine Learning-Based Mortality Prediction Model for Patients with Chronic Hepatitis C Infection: An Exploratory Study.一种基于机器学习的慢性丙型肝炎感染患者死亡率预测模型：一项探索性研究。

J Clin Med. 2024 May 16;13(10):2939. doi: 10.3390/jcm13102939.

Machine Learning Approaches Identify Chemical Features for Stage-Specific Antimalarial Compounds.机器学习方法识别特定阶段抗疟化合物的化学特征。

ACS Omega. 2023 Nov 7;8(46):43813-43826. doi: 10.1021/acsomega.3c05664. eCollection 2023 Nov 21.

本文引用的文献

EdgeSHAPer: Bond-centric Shapley value-based explanation method for graph neural networks.EdgeSHAPer：用于图神经网络的基于边中心Shapley值的解释方法。

iScience. 2022 Aug 30;25(10):105043. doi: 10.1016/j.isci.2022.105043. eCollection 2022 Oct 21.

Calculation of exact Shapley values for support vector machines with Tanimoto kernel enables model interpretation.使用Tanimoto核的支持向量机精确Shapley值的计算能够实现模型解释。

iScience. 2022 Aug 27;25(9):105023. doi: 10.1016/j.isci.2022.105023. eCollection 2022 Sep 16.

Model agnostic generation of counterfactual explanations for molecules.分子反事实解释的模型无关生成

Chem Sci. 2022 Feb 16;13(13):3697-3705. doi: 10.1039/d1sc05259d. eCollection 2022 Mar 30.

Generative Models for De Novo Drug Design.用于从头药物设计的生成模型。

J Med Chem. 2021 Oct 14;64(19):14011-14027. doi: 10.1021/acs.jmedchem.1c00927. Epub 2021 Sep 17.

Principles and Practice of Explainable Machine Learning.可解释机器学习原理与实践

Front Big Data. 2021 Jul 1;4:688969. doi: 10.3389/fdata.2021.688969. eCollection 2021.

Artificial Intelligence in Chemistry: Current Trends and Future Directions.人工智能在化学领域的应用：当前趋势和未来方向。

J Chem Inf Model. 2021 Jul 26;61(7):3197-3212. doi: 10.1021/acs.jcim.1c00619. Epub 2021 Jul 15.

De novo molecular design and generative models.从头分子设计与生成模型。

Drug Discov Today. 2021 Nov;26(11):2707-2715. doi: 10.1016/j.drudis.2021.05.019. Epub 2021 Jun 1.

State-of-the-art of artificial intelligence in medicinal chemistry.药物化学中人工智能的最新进展。

Future Sci OA. 2021 Mar 29;7(6):FSO702. doi: 10.2144/fsoa-2021-0030.

XAI-Explainable artificial intelligence.可解释人工智能

Sci Robot. 2019 Dec 18;4(37). doi: 10.1126/scirobotics.aay7120.

Explainable and trustworthy artificial intelligence for correctable modeling in chemical sciences.用于化学科学中可校正建模的可解释且可信的人工智能。

Sci Adv. 2020 Oct 14;6(42). doi: 10.1126/sciadv.abc3204. Print 2020 Oct.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于 Shapley 值分析揭示支持向量机和随机森林模型在化合物分类中学习特征的差异。

Differences in learning characteristics between support vector machine and random forest models for compound classification revealed by Shapley value analysis.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献