Suppr超能文献

GPUTreeShap:用于树集成的SHAP值的大规模并行精确计算。

GPUTreeShap: massively parallel exact calculation of SHAP scores for tree ensembles.

作者信息

Mitchell Rory, Frank Eibe, Holmes Geoffrey

机构信息

Nvidia, Santa Clara, United States.

University of Waikato, Hamilton, New Zealand.

出版信息

PeerJ Comput Sci. 2022 Apr 5;8:e880. doi: 10.7717/peerj-cs.880. eCollection 2022.

Abstract

SHapley Additive exPlanation (SHAP) values (Lundberg & Lee, 2017) provide a game theoretic interpretation of the predictions of machine learning models based on Shapley values (Shapley, 1953). While exact calculation of SHAP values is computationally intractable in general, a recursive polynomial-time algorithm called TreeShap (Lundberg et al., 2020) is available for decision tree models. However, despite its polynomial time complexity, TreeShap can become a significant bottleneck in practical machine learning pipelines when applied to large decision tree ensembles. Unfortunately, the complicated TreeShap algorithm is difficult to map to hardware accelerators such as GPUs. In this work, we present GPUTreeShap, a reformulated TreeShap algorithm suitable for massively parallel computation on graphics processing units. Our approach first preprocesses each decision tree to isolate variable sized sub-problems from the original recursive algorithm, then solves a bin packing problem, and finally maps sub-problems to single-instruction, multiple-thread (SIMT) tasks for parallel execution with specialised hardware instructions. With a single NVIDIA Tesla V100-32 GPU, we achieve speedups of up to 19× for SHAP values, and speedups of up to 340× for SHAP interaction values, over a state-of-the-art multi-core CPU implementation executed on two 20-core Xeon E5-2698 v4 2.2 GHz CPUs. We also experiment with multi-GPU computing using eight V100 GPUs, demonstrating throughput of 1.2 M rows per second-equivalent CPU-based performance is estimated to require 6850 CPU cores.

摘要

SHapley值加法解释(SHAP)(伦德伯格和李,2017年)基于Shapley值(沙普利,1953年)为机器学习模型的预测提供了一种博弈论解释。虽然一般来说,SHAP值的精确计算在计算上是难以处理的,但一种名为TreeShap的递归多项式时间算法(伦德伯格等人,2020年)可用于决策树模型。然而,尽管TreeShap具有多项式时间复杂度,但在应用于大型决策树集成时,它可能会成为实际机器学习管道中的一个重大瓶颈。不幸的是,复杂的TreeShap算法很难映射到诸如GPU之类的硬件加速器上。在这项工作中,我们提出了GPUTreeShap,这是一种经过重新设计的TreeShap算法,适用于在图形处理单元上进行大规模并行计算。我们的方法首先对每个决策树进行预处理,以从原始递归算法中分离出大小可变的子问题,然后解决一个装箱问题,最后将子问题映射到单指令多线程(SIMT)任务,以便使用专门的硬件指令进行并行执行。使用一块NVIDIA Tesla V100 - 32 GPU,与在两颗20核英特尔至强E5 - 2698 v4 2.2 GHz CPU上执行的最先进的多核CPU实现相比,我们在SHAP值计算上实现了高达19倍的加速,在SHAP交互值计算上实现了高达340倍的加速。我们还使用八块V100 GPU进行了多GPU计算实验,展示了每秒120万行的吞吐量——基于CPU的性能估计需要6850个CPU核心才能达到。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/27b9/9044362/fea34ac96916/peerj-cs-08-880-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验