GPUTreeShap：用于树集成的SHAP值的大规模并行精确计算。

GPUTreeShap: massively parallel exact calculation of SHAP scores for tree ensembles.

作者信息

Mitchell Rory, Frank Eibe, Holmes Geoffrey

机构信息

Nvidia, Santa Clara, United States.

University of Waikato, Hamilton, New Zealand.

出版信息

PeerJ Comput Sci. 2022 Apr 5;8:e880. doi: 10.7717/peerj-cs.880. eCollection 2022.

DOI:10.7717/peerj-cs.880

PMID:35494875

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9044362/

Abstract

SHapley Additive exPlanation (SHAP) values (Lundberg & Lee, 2017) provide a game theoretic interpretation of the predictions of machine learning models based on Shapley values (Shapley, 1953). While exact calculation of SHAP values is computationally intractable in general, a recursive polynomial-time algorithm called TreeShap (Lundberg et al., 2020) is available for decision tree models. However, despite its polynomial time complexity, TreeShap can become a significant bottleneck in practical machine learning pipelines when applied to large decision tree ensembles. Unfortunately, the complicated TreeShap algorithm is difficult to map to hardware accelerators such as GPUs. In this work, we present GPUTreeShap, a reformulated TreeShap algorithm suitable for massively parallel computation on graphics processing units. Our approach first preprocesses each decision tree to isolate variable sized sub-problems from the original recursive algorithm, then solves a bin packing problem, and finally maps sub-problems to single-instruction, multiple-thread (SIMT) tasks for parallel execution with specialised hardware instructions. With a single NVIDIA Tesla V100-32 GPU, we achieve speedups of up to 19× for SHAP values, and speedups of up to 340× for SHAP interaction values, over a state-of-the-art multi-core CPU implementation executed on two 20-core Xeon E5-2698 v4 2.2 GHz CPUs. We also experiment with multi-GPU computing using eight V100 GPUs, demonstrating throughput of 1.2 M rows per second-equivalent CPU-based performance is estimated to require 6850 CPU cores.

摘要

SHapley值加法解释（SHAP）（伦德伯格和李，2017年）基于Shapley值（沙普利，1953年）为机器学习模型的预测提供了一种博弈论解释。虽然一般来说，SHAP值的精确计算在计算上是难以处理的，但一种名为TreeShap的递归多项式时间算法（伦德伯格等人，2020年）可用于决策树模型。然而，尽管TreeShap具有多项式时间复杂度，但在应用于大型决策树集成时，它可能会成为实际机器学习管道中的一个重大瓶颈。不幸的是，复杂的TreeShap算法很难映射到诸如GPU之类的硬件加速器上。在这项工作中，我们提出了GPUTreeShap，这是一种经过重新设计的TreeShap算法，适用于在图形处理单元上进行大规模并行计算。我们的方法首先对每个决策树进行预处理，以从原始递归算法中分离出大小可变的子问题，然后解决一个装箱问题，最后将子问题映射到单指令多线程（SIMT）任务，以便使用专门的硬件指令进行并行执行。使用一块NVIDIA Tesla V100 - 32 GPU，与在两颗20核英特尔至强E5 - 2698 v4 2.2 GHz CPU上执行的最先进的多核CPU实现相比，我们在SHAP值计算上实现了高达19倍的加速，在SHAP交互值计算上实现了高达340倍的加速。我们还使用八块V100 GPU进行了多GPU计算实验，展示了每秒120万行的吞吐量——基于CPU的性能估计需要6850个CPU核心才能达到。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/27b9/9044362/fea34ac96916/peerj-cs-08-880-g001.jpg

相似文献

GPUTreeShap: massively parallel exact calculation of SHAP scores for tree ensembles.GPUTreeShap：用于树集成的SHAP值的大规模并行精确计算。

PeerJ Comput Sci. 2022 Apr 5;8:e880. doi: 10.7717/peerj-cs.880. eCollection 2022.

Interpretation of machine learning models using shapley values: application to compound potency and multi-target activity predictions.使用 Shapley 值解释机器学习模型：在化合物效力和多靶点活性预测中的应用。

J Comput Aided Mol Des. 2020 Oct;34(10):1013-1026. doi: 10.1007/s10822-020-00314-0. Epub 2020 May 2.

CUDASW++ 3.0: accelerating Smith-Waterman protein database search by coupling CPU and GPU SIMD instructions.CUDASW++ 3.0：通过结合 CPU 和 GPU 的 SIMD 指令来加速 Smith-Waterman 蛋白质数据库搜索。

BMC Bioinformatics. 2013 Apr 4;14:117. doi: 10.1186/1471-2105-14-117.

Faster Self-Consistent Field (SCF) Calculations on GPU Clusters.在GPU集群上更快的自洽场（SCF）计算

J Chem Theory Comput. 2021 Dec 14;17(12):7486-7503. doi: 10.1021/acs.jctc.1c00720. Epub 2021 Nov 15.

Coupling SIMD and SIMT architectures to boost performance of a phylogeny-aware alignment kernel.将 SIMD 和 SIMT 架构进行耦合以提高具有系统发育感知的对齐核的性能。

BMC Bioinformatics. 2012 Aug 9;13:196. doi: 10.1186/1471-2105-13-196.

Accelerating epistasis analysis in human genetics with consumer graphics hardware.利用消费级图形硬件加速人类遗传学中的上位性分析。

BMC Res Notes. 2009 Jul 24;2:149. doi: 10.1186/1756-0500-2-149.

High performance computing for deformable image registration: towards a new paradigm in adaptive radiotherapy.用于可变形图像配准的高性能计算：迈向自适应放射治疗的新范式。

Med Phys. 2008 Aug;35(8):3546-53. doi: 10.1118/1.2948318.

Fast on-site Monte Carlo tool for dose calculations in CT applications.快速现场蒙特卡罗工具，用于 CT 应用中的剂量计算。

Med Phys. 2012 Jun;39(6):2985-96. doi: 10.1118/1.4711748.

Parallel beamlet dose calculation via beamlet contexts in a distributed multi-GPU framework.基于分布式多 GPU 框架中的束流子区域进行平行束流子剂量计算。

Med Phys. 2019 Aug;46(8):3719-3733. doi: 10.1002/mp.13651. Epub 2019 Jun 30.

GAMUT: GPU accelerated microRNA analysis to uncover target genes through CUDA-miRanda.GAMUT：通过CUDA-miRanda实现GPU加速的微小RNA分析以揭示靶基因

BMC Med Genomics. 2014;7 Suppl 1(Suppl 1):S9. doi: 10.1186/1755-8794-7-S1-S9. Epub 2014 May 8.

引用本文的文献

Prediction model for chemical explosion consequences via multimodal feature fusion.基于多模态特征融合的化学爆炸后果预测模型

J Cheminform. 2025 Aug 5;17(1):118. doi: 10.1186/s13321-025-01060-x.

Developing an Explainable Prognostic Model for Acute Ischemic Stroke: Combining Clinical and Inflammatory Biomarkers With Machine Learning.开发一种用于急性缺血性中风的可解释性预后模型：将临床和炎症生物标志物与机器学习相结合。

Brain Behav. 2025 Aug;15(8):e70673. doi: 10.1002/brb3.70673.

Developing a fully applicable machine learning (ML) based sex classification model using linear cranial dimensions.使用线性颅骨尺寸开发一个完全适用的基于机器学习（ML）的性别分类模型。

Sci Rep. 2024 Dec 28;14(1):30969. doi: 10.1038/s41598-024-82073-8.

An interpretable machine learning model for precise prediction of biomarkers for intermittent fasting pattern.一种用于精确预测间歇性禁食模式生物标志物的可解释机器学习模型。

Nutr Metab (Lond). 2024 Dec 18;21(1):106. doi: 10.1186/s12986-024-00876-y.

A robust LightGBM model for concrete tensile strength forecast to aid in resilience-based structure strategies.一种用于混凝土抗拉强度预测的强大LightGBM模型，以辅助基于韧性的结构策略。

Heliyon. 2024 Oct 22;10(20):e39679. doi: 10.1016/j.heliyon.2024.e39679. eCollection 2024 Oct 30.

Finding the most potent compounds using active learning on molecular pairs.利用分子对的主动学习寻找最有效的化合物。

Beilstein J Org Chem. 2024 Aug 27;20:2152-2162. doi: 10.3762/bjoc.20.185. eCollection 2024.

Leveraging bounded datapoints to classify molecular potency improvements.利用有限的数据点对分子效力的提升进行分类。

RSC Med Chem. 2024 May 31;15(7):2474-2482. doi: 10.1039/d4md00325j. eCollection 2024 Jul 17.

Explainable AI for CHO cell culture media optimization and prediction of critical quality attribute.用于CHO细胞培养基优化和关键质量属性预测的可解释人工智能

Appl Microbiol Biotechnol. 2024 Apr 24;108(1):308. doi: 10.1007/s00253-024-13147-w.

Neural network-based prediction of auto-ignition temperature of ternary mixed liquids.基于神经网络的三元混合液体自燃温度预测

Heliyon. 2024 Mar 27;10(7):e28713. doi: 10.1016/j.heliyon.2024.e28713. eCollection 2024 Apr 15.

drexml: A command line tool and Python package for drug repurposing.drexml：一种用于药物重新利用的命令行工具和Python包。

Comput Struct Biotechnol J. 2024 Mar 1;23:1129-1143. doi: 10.1016/j.csbj.2024.02.027. eCollection 2024 Dec.

本文引用的文献

From Local Explanations to Global Understanding with Explainable AI for Trees.利用可解释人工智能实现从局部解释到树木的全局理解

Nat Mach Intell. 2020 Jan;2(1):56-67. doi: 10.1038/s42256-019-0138-9. Epub 2020 Jan 17.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

GPUTreeShap：用于树集成的SHAP值的大规模并行精确计算。

GPUTreeShap: massively parallel exact calculation of SHAP scores for tree ensembles.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献