Suppr超能文献

重症监护病房中基于人工智能的评分系统的外部验证:一项系统评价和荟萃分析。

External validation of AI-based scoring systems in the ICU: a systematic review and meta-analysis.

作者信息

Rockenschaub Patrick, Akay Ela Marie, Carlisle Benjamin Gregory, Hilbert Adam, Wendland Joshua, Meyer-Eschenbach Falk, Näher Anatol-Fiete, Frey Dietmar, Madai Vince Istvan

机构信息

CLAIM - Charité Lab for AI in Medicine, Charité - Universitätsmedizin Berlin, Berlin, Germany.

QUEST Center for Responsible Research, Berlin Institute of Health at Charité Universitätsmedizin Berlin, Berlin, Germany.

出版信息

BMC Med Inform Decis Mak. 2025 Jan 6;25(1):5. doi: 10.1186/s12911-024-02830-7.

Abstract

BACKGROUND

Machine learning (ML) is increasingly used to predict clinical deterioration in intensive care unit (ICU) patients through scoring systems. Although promising, such algorithms often overfit their training cohort and perform worse at new hospitals. Thus, external validation is a critical - but frequently overlooked - step to establish the reliability of predicted risk scores to translate them into clinical practice. We systematically reviewed how regularly external validation of ML-based risk scores is performed and how their performance changed in external data.

METHODS

We searched MEDLINE, Web of Science, and arXiv for studies using ML to predict deterioration of ICU patients from routine data. We included primary research published in English before December 2023. We summarised how many studies were externally validated, assessing differences over time, by outcome, and by data source. For validated studies, we evaluated the change in area under the receiver operating characteristic (AUROC) attributable to external validation using linear mixed-effects models.

RESULTS

We included 572 studies, of which 84 (14.7%) were externally validated, increasing to 23.9% by 2023. Validated studies made disproportionate use of open-source data, with two well-known US datasets (MIMIC and eICU) accounting for 83.3% of studies. On average, AUROC was reduced by -0.037 (95% CI -0.052 to -0.027) in external data, with more than 0.05 reduction in 49.5% of studies.

DISCUSSION

External validation, although increasing, remains uncommon. Performance was generally lower in external data, questioning the reliability of some recently proposed ML-based scores. Interpretation of the results was challenged by an overreliance on the same few datasets, implicit differences in case mix, and exclusive use of AUROC.

摘要

背景

机器学习(ML)越来越多地用于通过评分系统预测重症监护病房(ICU)患者的临床病情恶化。尽管前景广阔,但此类算法往往会过度拟合其训练队列,在新医院的表现更差。因此,外部验证是建立预测风险评分的可靠性以将其转化为临床实践的关键步骤,但这一步骤经常被忽视。我们系统地回顾了基于机器学习的风险评分的外部验证执行频率,以及它们在外部数据中的性能变化。

方法

我们在MEDLINE、科学网和arXiv上搜索了使用机器学习从常规数据预测ICU患者病情恶化的研究。我们纳入了2023年12月之前发表的英文原创研究。我们总结了有多少研究进行了外部验证,评估了随时间、结果和数据源的差异。对于经过验证的研究,我们使用线性混合效应模型评估了外部验证导致的受试者操作特征曲线下面积(AUROC)的变化。

结果

我们纳入了572项研究,其中84项(14.7%)进行了外部验证,到2023年这一比例增至23.9%。经过验证的研究过度使用了开源数据,美国的两个知名数据集(MIMIC和eICU)占研究的83.3%。在外部数据中,AUROC平均降低了-0.037(95%置信区间为-0.052至-0.027),49.5%的研究降低超过0.05。

讨论

外部验证虽然有所增加,但仍然不常见。外部数据中的性能通常较低,这对一些最近提出的基于机器学习的评分的可靠性提出了质疑。对结果的解释受到以下因素的挑战:过度依赖少数相同的数据集、病例组合的隐含差异以及仅使用AUROC。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8fae/11702098/d648be54e4ef/12911_2024_2830_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验