Suppr超能文献

基于 Shapley 值分析揭示支持向量机和随机森林模型在化合物分类中学习特征的差异。

Differences in learning characteristics between support vector machine and random forest models for compound classification revealed by Shapley value analysis.

机构信息

B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Department of Life Science Informatics and Data Science, Rheinische Friedrich-Wilhelms-Universität, Friedrich-Hirzebruch-Allee 5/6, 53115, Bonn, Germany.

出版信息

Sci Rep. 2023 Apr 12;13(1):5983. doi: 10.1038/s41598-023-33215-x.

Abstract

The random forest (RF) and support vector machine (SVM) methods are mainstays in molecular machine learning (ML) and compound property prediction. We have explored in detail how binary classification models derived using these algorithms arrive at their predictions. To these ends, approaches from explainable artificial intelligence (XAI) are applicable such as the Shapley value concept originating from game theory that we adapted and further extended for our analysis. In large-scale activity-based compound classification using models derived from training sets of increasing size, RF and SVM with the Tanimoto kernel produced very similar predictions that could hardly be distinguished. However, Shapley value analysis revealed that their learning characteristics systematically differed and that chemically intuitive explanations of accurate RF and SVM predictions had different origins.

摘要

随机森林(RF)和支持向量机(SVM)方法是分子机器学习(ML)和化合物性质预测的主要方法。我们详细探讨了使用这些算法得出的二进制分类模型如何进行预测。为此,可应用可解释人工智能(XAI)方法,例如源自博弈论的 Shapley 值概念,我们对其进行了改编和进一步扩展,以用于我们的分析。在使用源自不断增大的训练集的模型进行基于活性的大规模化合物分类中,RF 和 SVM 与 Tanimoto 核产生的预测非常相似,几乎无法区分。然而,Shapley 值分析表明,它们的学习特征系统不同,并且准确的 RF 和 SVM 预测的化学直观解释具有不同的起源。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4125/10097675/de767f18ab37/41598_2023_33215_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验