可解释的机器学习模型在医院再入院预测中的应用：一种两步提取回归树方法。

Interpretable machine learning models for hospital readmission prediction: a two-step extracted regression tree approach.

机构信息

School of Industrial Engineering, Purdue University, West Lafayette, USA.

Department of Electrical Engineering and Computer Sciences, UC Berkeley, Berkeley, USA.

出版信息

BMC Med Inform Decis Mak. 2023 Jun 5;23(1):104. doi: 10.1186/s12911-023-02193-5.

DOI:10.1186/s12911-023-02193-5

PMID:37277767

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10243084/

Abstract

BACKGROUND

Advanced machine learning models have received wide attention in assisting medical decision making due to the greater accuracy they can achieve. However, their limited interpretability imposes barriers for practitioners to adopt them. Recent advancements in interpretable machine learning tools allow us to look inside the black box of advanced prediction methods to extract interpretable models while maintaining similar prediction accuracy, but few studies have investigated the specific hospital readmission prediction problem with this spirit.

METHODS

Our goal is to develop a machine-learning (ML) algorithm that can predict 30- and 90- day hospital readmissions as accurately as black box algorithms while providing medically interpretable insights into readmission risk factors. Leveraging a state-of-art interpretable ML model, we use a two-step Extracted Regression Tree approach to achieve this goal. In the first step, we train a black box prediction algorithm. In the second step, we extract a regression tree from the output of the black box algorithm that allows direct interpretation of medically relevant risk factors. We use data from a large teaching hospital in Asia to learn the ML model and verify our two-step approach.

RESULTS

The two-step method can obtain similar prediction performance as the best black box model, such as Neural Networks, measured by three metrics: accuracy, the Area Under the Curve (AUC) and the Area Under the Precision-Recall Curve (AUPRC), while maintaining interpretability. Further, to examine whether the prediction results match the known medical insights (i.e., the model is truly interpretable and produces reasonable results), we show that key readmission risk factors extracted by the two-step approach are consistent with those found in the medical literature.

CONCLUSIONS

The proposed two-step approach yields meaningful prediction results that are both accurate and interpretable. This study suggests a viable means to improve the trust of machine learning based models in clinical practice for predicting readmissions through the two-step approach.

摘要

背景

由于能够实现更高的准确性，先进的机器学习模型在辅助医疗决策方面受到了广泛关注。然而，它们的可解释性有限，这使得从业者难以采用这些模型。可解释机器学习工具的最新进展使我们能够深入了解高级预测方法的“黑箱”，从中提取可解释的模型，同时保持类似的预测准确性，但很少有研究以这种精神探讨特定的医院再入院预测问题。

方法

我们的目标是开发一种机器学习（ML）算法，该算法可以像黑盒算法一样准确地预测 30 天和 90 天的医院再入院率，同时提供有关再入院风险因素的医学可解释见解。利用一种最先进的可解释 ML 模型，我们使用两步提取回归树方法来实现这一目标。在第一步中，我们训练一个黑盒预测算法。在第二步中，我们从黑盒算法的输出中提取一个回归树，该树允许直接解释与医学相关的风险因素。我们使用来自亚洲一家大型教学医院的数据来学习 ML 模型并验证我们的两步方法。

结果

两步方法可以获得与最佳黑盒模型（例如神经网络）类似的预测性能，通过三个指标来衡量：准确性、曲线下面积（AUC）和精度-召回率曲线下面积（AUPRC），同时保持可解释性。此外，为了检查预测结果是否与已知的医学见解相符（即，模型是否真正可解释并产生合理的结果），我们表明，两步方法提取的关键再入院风险因素与医学文献中发现的因素一致。