非线性 QSAR 模型在 Ames 致突变性数据中的应用解释。

Interpretation of nonlinear QSAR models applied to Ames mutagenicity data.

机构信息

Safety Assessment, AstraZeneca Research & Development, 43183 Molndal, Sweden.

出版信息

J Chem Inf Model. 2009 Nov;49(11):2551-8. doi: 10.1021/ci9002206.

PMID:19824682

Abstract

A method for local interpretation of QSAR models is presented and applied to an Ames mutagenicity data set. In the work presented, local interpretation of Support Vector Machine and Random Forest models is achieved by retrieving the variable corresponding to the largest component of the decision-function gradient at any point in the model. This contribution to the model is the variable that is regarded as having the most importance at that particular point in the model. The method described has been verified using two sets of simulated data and Ames mutagenicity data. This work indicates that it is possible to interpret nonlinear machine-learning methods. Comparison to an interpretable linear method is also presented.

摘要

本文提出了一种 QSAR 模型的局部解释方法，并将其应用于 Ames 致突变性数据集。在本工作中，通过检索决策函数梯度的最大分量在模型中的任何点对应的变量，实现了支持向量机和随机森林模型的局部解释。这是在模型的特定点被认为最重要的变量。该方法已使用两组模拟数据和 Ames 致突变性数据进行了验证。本工作表明，有可能解释非线性机器学习方法。还提出了与可解释的线性方法的比较。