Suppr超能文献

机器学习在急性肾损伤临床预测模型中的应用:基线肌酐对预测效能的影响。

Machine learning clinical prediction models for acute kidney injury: the impact of baseline creatinine on prediction efficacy.

机构信息

Queensland Digital Health Centre, Faculty of Medicine, The University of Queensland, Herston, Brisbane, 4006, Australia.

Digital Health Cooperative Research Centre, Australian Government, Sydney, NSW, Australia.

出版信息

BMC Med Inform Decis Mak. 2023 Oct 9;23(1):207. doi: 10.1186/s12911-023-02306-0.

Abstract

BACKGROUND

There are many Machine Learning (ML) models which predict acute kidney injury (AKI) for hospitalised patients. While a primary goal of these models is to support clinical decision-making, the adoption of inconsistent methods of estimating baseline serum creatinine (sCr) may result in a poor understanding of these models' effectiveness in clinical practice. Until now, the performance of such models with different baselines has not been compared on a single dataset. Additionally, AKI prediction models are known to have a high rate of false positive (FP) events regardless of baseline methods. This warrants further exploration of FP events to provide insight into potential underlying reasons.

OBJECTIVE

The first aim of this study was to assess the variance in performance of ML models using three methods of baseline sCr on a retrospective dataset. The second aim was to conduct an error analysis to gain insight into the underlying factors contributing to FP events.

MATERIALS AND METHODS

The Intensive Care Unit (ICU) patients of the Medical Information Mart for Intensive Care (MIMIC)-IV dataset was used with the KDIGO (Kidney Disease Improving Global Outcome) definition to identify AKI episodes. Three different methods of estimating baseline sCr were defined as (1) the minimum sCr, (2) the Modification of Diet in Renal Disease (MDRD) equation and the minimum sCr and (3) the MDRD equation and the mean of preadmission sCr. For the first aim of this study, a suite of ML models was developed for each baseline and the performance of the models was assessed. An analysis of variance was performed to assess the significant difference between eXtreme Gradient Boosting (XGB) models across all baselines. To address the second aim, Explainable AI (XAI) methods were used to analyse the XGB errors with Baseline 3.

RESULTS

Regarding the first aim, we observed variances in discriminative metrics and calibration errors of ML models when different baseline methods were adopted. Using Baseline 1 resulted in a 14% reduction in the f1 score for both Baseline 2 and Baseline 3. There was no significant difference observed in the results between Baseline 2 and Baseline 3. For the second aim, the FP cohort was analysed using the XAI methods which led to relabelling data with the mean of sCr in 180 to 0 days pre-ICU as the preferred sCr baseline method. The XGB model using this relabelled data achieved an AUC of 0.85, recall of 0.63, precision of 0.54 and f1 score of 0.58. The cohort size was 31,586 admissions, of which 5,473 (17.32%) had AKI.

CONCLUSION

In the absence of a widely accepted method of baseline sCr, AKI prediction studies need to consider the impact of different baseline methods on the effectiveness of ML models and their potential implications in real-world implementations. The utilisation of XAI methods can be effective in providing insight into the occurrence of prediction errors. This can potentially augment the success rate of ML implementation in routine care.

摘要

背景

有许多机器学习 (ML) 模型可用于预测住院患者的急性肾损伤 (AKI)。虽然这些模型的主要目标是支持临床决策,但采用不一致的估计基线血清肌酐 (sCr) 的方法可能会导致对这些模型在临床实践中的有效性的理解不足。到目前为止,还没有在单个数据集上比较具有不同基线的此类模型的性能。此外,无论采用何种基线方法,AKI 预测模型的假阳性 (FP) 事件率都很高。这需要进一步探索 FP 事件,以深入了解潜在原因。

目的

本研究的第一个目的是使用三种基线 sCr 方法评估回顾性数据集上 ML 模型的性能差异。第二个目的是进行误差分析,以深入了解导致 FP 事件的潜在因素。

材料和方法

使用重症监护病房 (ICU) 的医疗信息集市重症监护 (MIMIC-IV) 数据集,使用 KDIGO(肾脏疾病改善全球结果)定义来识别 AKI 发作。定义了三种估计基线 sCr 的不同方法,分别为 (1) 最小 sCr,(2) 肾脏病饮食改良公式 (MDRD) 方程和最小 sCr,以及 (3) MDRD 方程和入院前 sCr 的平均值。对于本研究的第一个目的,为每个基线开发了一套 ML 模型,并评估了模型的性能。对所有基线的 XGB 模型进行了方差分析,以评估模型之间的显著差异。为了实现第二个目的,使用可解释人工智能 (XAI) 方法分析了基线 3 的 XGB 错误。

结果

关于第一个目的,当采用不同的基线方法时,我们观察到 ML 模型的判别指标和校准误差存在差异。使用基线 1 会使基线 2 和基线 3 的 f1 分数分别降低 14%。基线 2 和基线 3 之间没有观察到显著差异。对于第二个目的,使用 XAI 方法分析了 FP 队列,导致 ICU 前 180 至 0 天的 sCr 平均值重新标记为首选 sCr 基线方法。使用重新标记数据的 XGB 模型实现了 AUC 为 0.85、召回率为 0.63、精度为 0.54 和 f1 分数为 0.58。队列规模为 31586 次入院,其中 5473 次(17.32%)发生 AKI。

结论

在缺乏广泛接受的基线 sCr 方法的情况下,AKI 预测研究需要考虑不同基线方法对 ML 模型有效性的影响及其在实际实施中的潜在影响。使用 XAI 方法可以有效地深入了解预测错误的发生。这有可能提高 ML 在常规护理中的实施成功率。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/529f/10563357/3f37daa74a12/12911_2023_2306_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验