利用可解释人工智能实现从局部解释到树木的全局理解

From Local Explanations to Global Understanding with Explainable AI for Trees.

作者信息

Lundberg Scott M, Erion Gabriel, Chen Hugh, DeGrave Alex, Prutkin Jordan M, Nair Bala, Katz Ronit, Himmelfarb Jonathan, Bansal Nisha, Lee Su-In

机构信息

Microsoft Research.

Paul G. Allen School of Computer Science and Engineering, University of Washington.

出版信息

Nat Mach Intell. 2020 Jan;2(1):56-67. doi: 10.1038/s42256-019-0138-9. Epub 2020 Jan 17.

DOI:10.1038/s42256-019-0138-9

PMID:32607472

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7326367/

Abstract

Tree-based machine learning models such as random forests, decision trees, and gradient boosted trees are popular non-linear predictive models, yet comparatively little attention has been paid to explaining their predictions. Here, we improve the interpretability of tree-based models through three main contributions: 1) The first polynomial time algorithm to compute optimal explanations based on game theory. 2) A new type of explanation that directly measures local feature interaction effects. 3) A new set of tools for understanding global model structure based on combining many local explanations of each prediction. We apply these tools to three medical machine learning problems and show how combining many high-quality local explanations allows us to represent global structure while retaining local faithfulness to the original model. These tools enable us to i) identify high magnitude but low frequency non-linear mortality risk factors in the US population, ii) highlight distinct population sub-groups with shared risk characteristics, iii) identify non-linear interaction effects among risk factors for chronic kidney disease, and iv) monitor a machine learning model deployed in a hospital by identifying which features are degrading the model's performance over time. Given the popularity of tree-based machine learning models, these improvements to their interpretability have implications across a broad set of domains.

摘要

基于树的机器学习模型，如随机森林、决策树和梯度提升树，是流行的非线性预测模型，但相对而言，人们对解释它们的预测关注较少。在这里，我们通过三个主要贡献提高了基于树的模型的可解释性：1）基于博弈论计算最优解释的第一个多项式时间算法。2）一种直接测量局部特征交互效应的新型解释。3）一套基于组合每个预测的多个局部解释来理解全局模型结构的新工具。我们将这些工具应用于三个医学机器学习问题，并展示了如何通过组合多个高质量的局部解释来表示全局结构，同时保持对原始模型的局部忠实性。这些工具使我们能够：i）识别美国人群中高幅度但低频的非线性死亡风险因素；ii）突出具有共同风险特征的不同人群亚组；iii）识别慢性肾病风险因素之间的非线性交互效应；iv）通过识别哪些特征随着时间的推移正在降低模型性能来监测部署在医院中的机器学习模型。鉴于基于树的机器学习模型的流行，这些对其可解释性的改进在广泛的领域中都有影响。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a29e/7326367/c539f4fb84be/nihms-1601475-f0004.jpg

相似文献

From Local Explanations to Global Understanding with Explainable AI for Trees.利用可解释人工智能实现从局部解释到树木的全局理解

Nat Mach Intell. 2020 Jan;2(1):56-67. doi: 10.1038/s42256-019-0138-9. Epub 2020 Jan 17.

Interpretable machine learning with tree-based shapley additive explanations: Application to metabolomics datasets for binary classification.基于树的 Shapley 加性解释的可解释机器学习：在代谢组学数据集的二元分类中的应用。

PLoS One. 2023 May 4;18(5):e0284315. doi: 10.1371/journal.pone.0284315. eCollection 2023.

Why did AI get this one wrong? - Tree-based explanations of machine learning model predictions.为什么 AI 会犯这个错误？——机器学习模型预测的基于树的解释。

Artif Intell Med. 2023 Jan;135:102471. doi: 10.1016/j.artmed.2022.102471. Epub 2022 Dec 1.

Explainable machine learning models based on multimodal time-series data for the early detection of Parkinson's disease.基于多模态时间序列数据的可解释机器学习模型用于帕金森病的早期检测。

Comput Methods Programs Biomed. 2023 Jun;234:107495. doi: 10.1016/j.cmpb.2023.107495. Epub 2023 Mar 23.

The Price of Explainability in Machine Learning Models for 100-Day Readmission Prediction in Heart Failure: Retrospective, Comparative, Machine Learning Study.机器学习模型在心力衰竭 100 天再入院预测中的可解释性代价：回顾性、比较性、机器学习研究。

J Med Internet Res. 2023 Oct 27;25:e46934. doi: 10.2196/46934.

A qualitative research framework for the design of user-centered displays of explanations for machine learning model predictions in healthcare.面向医疗保健中机器学习模型预测解释的以用户为中心的显示设计的定性研究框架。

BMC Med Inform Decis Mak. 2020 Oct 8;20(1):257. doi: 10.1186/s12911-020-01276-x.

Toward explainable AI (XAI) for mental health detection based on language behavior.迈向基于语言行为的可解释人工智能（XAI）用于心理健康检测。

Front Psychiatry. 2023 Dec 7;14:1219479. doi: 10.3389/fpsyt.2023.1219479. eCollection 2023.

Explainable Machine Learning Model for Predicting First-Time Acute Exacerbation in Patients with Chronic Obstructive Pulmonary Disease.用于预测慢性阻塞性肺疾病患者首次急性加重的可解释机器学习模型

J Pers Med. 2022 Feb 7;12(2):228. doi: 10.3390/jpm12020228.

Explainable Machine Learning Framework for Image Classification Problems: Case Study on Glioma Cancer Prediction.用于图像分类问题的可解释机器学习框架：脑胶质瘤癌症预测案例研究

J Imaging. 2020 May 28;6(6):37. doi: 10.3390/jimaging6060037.

On the interpretability of machine learning-based model for predicting hypertension.基于机器学习的高血压预测模型的可解释性研究。

BMC Med Inform Decis Mak. 2019 Jul 29;19(1):146. doi: 10.1186/s12911-019-0874-0.

引用本文的文献

Sub-diurnal asymmetric warming has amplified atmospheric dryness since the 1980s.自20世纪80年代以来，次昼夜不对称变暖加剧了大气干燥程度。

Nat Commun. 2025 Sep 9;16(1):8247. doi: 10.1038/s41467-025-63672-z.

Explainable Machine Learning Assists in Revealing Associations Between Polysomnographic Biomarkers and Incident Type 2 Diabetes in Men.可解释机器学习有助于揭示男性多导睡眠图生物标志物与2型糖尿病发病之间的关联。

Nat Sci Sleep. 2025 Aug 30;17:2013-2025. doi: 10.2147/NSS.S512262. eCollection 2025.

Development and validation of a machine learning model for predicting immune checkpoint inhibitor efficacy in advanced gastric cancer using dynamic changes in peripheral blood clinlabomics data: a retrospective multicenter cohort study.利用外周血临床实验室组学数据的动态变化预测晚期胃癌免疫检查点抑制剂疗效的机器学习模型的开发与验证：一项回顾性多中心队列研究

Gastric Cancer. 2025 Sep 7. doi: 10.1007/s10120-025-01655-1.

Computational approaches for toxicology and Pharmacokinetic properties prediction.毒理学和药代动力学性质预测的计算方法。

J Pharmacokinet Pharmacodyn. 2025 Sep 4;52(5):51. doi: 10.1007/s10928-025-09999-y.

Soil and forest floor respiration already acclimated to increasing temperatures in a mixed deciduous forest.在一片落叶混交林中，土壤和森林地表呼吸作用已适应不断升高的温度。

Ecol Process. 2025;14(1):71. doi: 10.1186/s13717-025-00639-4. Epub 2025 Sep 1.

Development and validation of an interpretable multi-task model to predict outcomes in patients with rhabdomyolysis: a multicenter retrospective cohort study.用于预测横纹肌溶解症患者预后的可解释多任务模型的开发与验证：一项多中心回顾性队列研究

EClinicalMedicine. 2025 Aug 21;87:103438. doi: 10.1016/j.eclinm.2025.103438. eCollection 2025 Sep.

Integrating Imaging-Derived Clinical Endotypes with Plasma Proteomics and External Polygenic Risk Scores Enhances Coronary Microvascular Disease Risk Prediction.将影像学衍生的临床内型与血浆蛋白质组学和外部多基因风险评分相结合可增强冠状动脉微血管疾病风险预测。

medRxiv. 2025 Aug 21:2025.08.18.25333844. doi: 10.1101/2025.08.18.25333844.

Association between hemoglobin glycation index and 28-day all-cause mortality in acute myocardial infarction patients: Analysis of the MIMIC-IV database.血红蛋白糖化指数与急性心肌梗死患者28天全因死亡率的关联：MIMIC-IV数据库分析

PLoS One. 2025 Sep 2;20(9):e0330819. doi: 10.1371/journal.pone.0330819. eCollection 2025.

Decoding the adolescent non-suicidal self-injury: understanding with interpretable machine learning insights.解码青少年非自杀性自伤行为：借助可解释的机器学习见解进行理解

BMC Public Health. 2025 Sep 1;25(1):2994. doi: 10.1186/s12889-025-24354-z.

Predicting the risk of threatened abortion using machine learning methods: a comparative study.使用机器学习方法预测先兆流产风险：一项比较研究。

BMC Pregnancy Childbirth. 2025 Aug 30;25(1):901. doi: 10.1186/s12884-025-08030-z.

本文引用的文献

Explainable machine-learning predictions for the prevention of hypoxaemia during surgery.用于预防手术期间低氧血症的可解释机器学习预测。

Nat Biomed Eng. 2018 Oct;2(10):749-760. doi: 10.1038/s41551-018-0304-0. Epub 2018 Oct 10.

Unmasking Clever Hans predictors and assessing what machines really learn.揭开聪明汉斯预测者的面具，评估机器真正学到了什么。

Nat Commun. 2019 Mar 11;10(1):1096. doi: 10.1038/s41467-019-08987-4.

Clinical Decision Support in the Era of Artificial Intelligence.人工智能时代的临床决策支持

JAMA. 2018 Dec 4;320(21):2199-2200. doi: 10.1001/jama.2018.17163.

White blood cell count predicts the odds of kidney function decline in a Chinese community-based population.白细胞计数可预测中国社区人群肾功能下降的几率。

BMC Nephrol. 2017 Jun 7;18(1):190. doi: 10.1186/s12882-017-0608-4.

Association between Monocyte Count and Risk of Incident CKD and Progression to ESRD.单核细胞计数与慢性肾脏病发病风险及进展至终末期肾病之间的关联。

Clin J Am Soc Nephrol. 2017 Apr 3;12(4):603-613. doi: 10.2215/CJN.09710916. Epub 2017 Mar 27.

Heart Disease and Stroke Statistics-2016 Update: A Report From the American Heart Association.《2016年心脏病和中风统计数据更新：美国心脏协会报告》

Circulation. 2016 Jan 26;133(4):e38-360. doi: 10.1161/CIR.0000000000000350. Epub 2015 Dec 16.

A Randomized Trial of Intensive versus Standard Blood-Pressure Control.强化与标准血压控制的随机试验

N Engl J Med. 2015 Nov 26;373(22):2103-16. doi: 10.1056/NEJMoa1511939. Epub 2015 Nov 9.

On the interpretation of weight vectors of linear models in multivariate neuroimaging.线性模型在多变量神经影像学中的权重向量解释。

Neuroimage. 2014 Feb 15;87:96-110. doi: 10.1016/j.neuroimage.2013.10.067. Epub 2013 Nov 15.

The identification of Parkinson's disease subtypes using cluster analysis: a systematic review.使用聚类分析对帕金森病亚型进行鉴定：系统评价。

Mov Disord. 2010 Jun 15;25(8):969-78. doi: 10.1002/mds.23116.

A random forest approach to the detection of epistatic interactions in case-control studies.一种用于病例对照研究中检测上位性相互作用的随机森林方法。

BMC Bioinformatics. 2009 Jan 30;10 Suppl 1(Suppl 1):S65. doi: 10.1186/1471-2105-10-S1-S65.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

利用可解释人工智能实现从局部解释到树木的全局理解

From Local Explanations to Global Understanding with Explainable AI for Trees.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献