Department of Circulation and Medical Imaging, The Norwegian University of Science and Technology, Trondheim, Norway.
Department of Electronic Systems, The Norwegian University of Science and Technology, Trondheim, Norway.
BMC Med Res Methodol. 2022 Feb 27;22(1):53. doi: 10.1186/s12874-022-01540-w.
Machine learning (ML) holds the promise of becoming an essential tool for utilising the increasing amount of clinical data available for analysis and clinical decision support. However, the lack of trust in the models has limited the acceptance of this technology in healthcare. This mistrust is often credited to the shortage of model explainability and interpretability, where the relationship between the input and output of the models is unclear. Improving trust requires the development of more transparent ML methods.
In this paper, we use the publicly available eICU database to construct a number of ML models before examining their internal behaviour with SHapley Additive exPlanations (SHAP) values. Our four models predicted hospital mortality in ICU patients using a selection of the same features used to calculate the APACHE IV score and were based on random forest, logistic regression, naive Bayes, and adaptive boosting algorithms.
The results showed the models had similar discriminative abilities and mostly agreed on feature importance while calibration and impact of individual features differed considerably and did in multiple cases not correspond to common medical theory.
We already know that ML models treat data differently depending on the underlying algorithm. Our comparative analysis visualises implications of these differences and their importance in a healthcare setting. SHAP value analysis is a promising method for incorporating explainability in model development and usage and might yield better and more trustworthy ML models in the future.
机器学习 (ML) 有望成为利用越来越多可用于分析和临床决策支持的临床数据的重要工具。然而,由于缺乏对模型的信任,该技术在医疗保健领域的应用受到限制。这种不信任通常归因于模型可解释性和可解释性的缺乏,即模型的输入和输出之间的关系不明确。要提高信任度,就需要开发更透明的 ML 方法。
在本文中,我们使用公开的 eICU 数据库构建了多个 ML 模型,然后使用 SHapley Additive exPlanations (SHAP) 值来检查它们的内部行为。我们的四个模型使用与计算 APACHE IV 评分相同的部分特征来预测 ICU 患者的医院死亡率,这些模型基于随机森林、逻辑回归、朴素贝叶斯和自适应提升算法。
结果表明,这些模型具有相似的判别能力,并且在特征重要性方面大多达成一致,而校准和个别特征的影响则有很大差异,并且在许多情况下与常见的医学理论不符。
我们已经知道,ML 模型会根据底层算法对数据进行不同的处理。我们的比较分析可视化了这些差异及其在医疗保健环境中的重要性。SHAP 值分析是在模型开发和使用中纳入可解释性的一种很有前途的方法,它可能会在未来产生更好、更值得信赖的 ML 模型。