Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Endenicher Allee 19c, 53115, Bonn, Germany.
J Comput Aided Mol Des. 2020 Oct;34(10):1013-1026. doi: 10.1007/s10822-020-00314-0. Epub 2020 May 2.
Difficulties in interpreting machine learning (ML) models and their predictions limit the practical applicability of and confidence in ML in pharmaceutical research. There is a need for agnostic approaches aiding in the interpretation of ML models regardless of their complexity that is also applicable to deep neural network (DNN) architectures and model ensembles. To these ends, the SHapley Additive exPlanations (SHAP) methodology has recently been introduced. The SHAP approach enables the identification and prioritization of features that determine compound classification and activity prediction using any ML model. Herein, we further extend the evaluation of the SHAP methodology by investigating a variant for exact calculation of Shapley values for decision tree methods and systematically compare this variant in compound activity and potency value predictions with the model-independent SHAP method. Moreover, new applications of the SHAP analysis approach are presented including interpretation of DNN models for the generation of multi-target activity profiles and ensemble regression models for potency prediction.
机器学习 (ML) 模型及其预测的解释困难限制了 ML 在药物研究中的实际应用和置信度。需要一种与 ML 模型的复杂性无关的、有助于解释的方法,这种方法还应适用于深度神经网络 (DNN) 架构和模型集成。为此,最近引入了 SHapley Additive exPlanations (SHAP) 方法。SHAP 方法可用于确定和优先考虑使用任何 ML 模型确定化合物分类和活性预测的特征。在此,我们通过研究一种用于精确计算决策树方法的 Shapley 值的变体进一步扩展了 SHAP 方法的评估,并系统地将该变体与独立于模型的 SHAP 方法在化合物活性和效价预测方面进行比较。此外,还提出了 SHAP 分析方法的新应用,包括解释用于生成多靶标活性谱的 DNN 模型和用于效价预测的集成回归模型。