文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

基于机器学习的临床风险预测模型的纵向模型转变:不同医院多个用例的评估研究

Longitudinal Model Shifts of Machine Learning-Based Clinical Risk Prediction Models: Evaluation Study of Multiple Use Cases Across Different Hospitals.

作者信息

Cabanillas Silva Patricia, Sun Hong, Rezk Mohamed, Roccaro-Waldmeyer Diana M, Fliegenschmidt Janis, Hulde Nikolai, von Dossow Vera, Meesseman Laurent, Depraetere Kristof, Stieg Joerg, Szymanowsky Ralph, Dahlweid Fried-Michael

机构信息

Dedalus HealthCare, Antwerp, Belgium.

Provincial Key Laboratory of Multimodal Perceiving and Intelligent Systems, Jiaxing University, Jiaxing, China.

出版信息

J Med Internet Res. 2024 Dec 13;26:e51409. doi: 10.2196/51409.


DOI:10.2196/51409
PMID:39671571
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11681292/
Abstract

BACKGROUND: In recent years, machine learning (ML)-based models have been widely used in clinical domains to predict clinical risk events. However, in production, the performances of such models heavily rely on changes in the system and data. The dynamic nature of the system environment, characterized by continuous changes, has significant implications for prediction models, leading to performance degradation and reduced clinical efficacy. Thus, monitoring model shifts and evaluating their impact on prediction models are of utmost importance. OBJECTIVE: This study aimed to assess the impact of a model shift on ML-based prediction models by evaluating 3 different use cases-delirium, sepsis, and acute kidney injury (AKI)-from 2 hospitals (M and H) with different patient populations and investigate potential model deterioration during the COVID-19 pandemic period. METHODS: We trained prediction models using retrospective data from earlier years and examined the presence of a model shift using data from more recent years. We used the area under the receiver operating characteristic curve (AUROC) to evaluate model performance and analyzed the calibration curves over time. We also assessed the influence on clinical decisions by evaluating the alert rate, the rates of over- and underdiagnosis, and the decision curve. RESULTS: The 2 data sets used in this study contained 189,775 and 180,976 medical cases for hospitals M and H, respectively. Statistical analyses (Z test) revealed no significant difference (P>.05) between the AUROCs from the different years for all use cases and hospitals. For example, in hospital M, AKI did not show a significant difference between 2020 (AUROC=0.898) and 2021 (AUROC=0.907, Z=-1.171, P=.242). Similar results were observed in both hospitals and for all use cases (sepsis and delirium) when comparing all the different years. However, when evaluating the calibration curves at the 2 hospitals, model shifts were observed for the delirium and sepsis use cases but not for AKI. Additionally, to investigate the clinical utility of our models, we performed decision curve analysis (DCA) and compared the results across the different years. A pairwise nonparametric statistical comparison showed no differences in the net benefit at the probability thresholds of interest (P>.05). The comprehensive evaluations performed in this study ensured robust model performance of all the investigated models across the years. Moreover, neither performance deteriorations nor alert surges were observed during the COVID-19 pandemic period. CONCLUSIONS: Clinical risk prediction models were affected by the dynamic and continuous evolution of clinical practices and workflows. The performance of the models evaluated in this study appeared stable when assessed using AUROCs, showing no significant variations over the years. Additional model shift investigations suggested that a calibration shift was present for certain use cases (delirium and sepsis). However, these changes did not have any impact on the clinical utility of the models based on DCA. Consequently, it is crucial to closely monitor data changes and detect possible model shifts, along with their potential influence on clinical decision-making.

摘要

背景:近年来,基于机器学习(ML)的模型已广泛应用于临床领域以预测临床风险事件。然而,在实际应用中,此类模型的性能严重依赖于系统和数据的变化。以持续变化为特征的系统环境的动态性质对预测模型具有重大影响,会导致性能下降和临床疗效降低。因此,监测模型变化并评估其对预测模型的影响至关重要。 目的:本研究旨在通过评估来自两家具有不同患者群体的医院(M医院和H医院)的3种不同用例——谵妄、脓毒症和急性肾损伤(AKI),来评估模型变化对基于ML的预测模型的影响,并调查COVID-19大流行期间潜在的模型恶化情况。 方法:我们使用早年的回顾性数据训练预测模型,并使用近年的数据检查模型变化的存在情况。我们使用受试者操作特征曲线下面积(AUROC)来评估模型性能,并分析随时间变化的校准曲线。我们还通过评估警报率、过度诊断和漏诊率以及决策曲线来评估对临床决策的影响。 结果:本研究中使用的两个数据集分别包含M医院和H医院的189,775例和180,976例医疗病例。统计分析(Z检验)显示,所有用例和医院不同年份的AUROC之间无显著差异(P>0.05)。例如,在M医院,AKI在2020年(AUROC = 0.898)和2021年(AUROC = 0.907,Z = -1.171,P = 0.242)之间未显示出显著差异。在比较所有不同年份时,两家医院的所有用例(脓毒症和谵妄)均观察到类似结果。然而,在评估两家医院的校准曲线时,观察到谵妄和脓毒症用例存在模型变化,而AKI则没有。此外,为了研究我们模型的临床效用,我们进行了决策曲线分析(DCA)并比较了不同年份的结果。成对非参数统计比较显示,在感兴趣的概率阈值下净效益无差异(P>0.05)。本研究中进行的综合评估确保了所有被调查模型多年来的稳健模型性能。此外,在COVID-19大流行期间未观察到性能下降或警报激增。 结论:临床风险预测模型受到临床实践和工作流程的动态持续演变的影响。当使用AUROC评估时,本研究中评估的模型性能似乎稳定,多年来未显示出显著变化。额外的模型变化调查表明,某些用例(谵妄和脓毒症)存在校准变化。然而,这些变化对基于DCA的模型的临床效用没有任何影响。因此,密切监测数据变化并检测可能的模型变化及其对临床决策的潜在影响至关重要。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc32/11681292/4cb1d7df2673/jmir_v26i1e51409_fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc32/11681292/4a7b23523933/jmir_v26i1e51409_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc32/11681292/7df520ae4422/jmir_v26i1e51409_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc32/11681292/d81a1b27b699/jmir_v26i1e51409_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc32/11681292/4cb1d7df2673/jmir_v26i1e51409_fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc32/11681292/4a7b23523933/jmir_v26i1e51409_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc32/11681292/7df520ae4422/jmir_v26i1e51409_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc32/11681292/d81a1b27b699/jmir_v26i1e51409_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc32/11681292/4cb1d7df2673/jmir_v26i1e51409_fig4.jpg

相似文献

[1]
Longitudinal Model Shifts of Machine Learning-Based Clinical Risk Prediction Models: Evaluation Study of Multiple Use Cases Across Different Hospitals.

J Med Internet Res. 2024-12-13

[2]
Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.

Clin Orthop Relat Res. 2024-12-1

[3]
Prescription of Controlled Substances: Benefits and Risks

2025-1

[4]
Are Current Survival Prediction Tools Useful When Treating Subsequent Skeletal-related Events From Bone Metastases?

Clin Orthop Relat Res. 2024-9-1

[5]
Does the Presence of Missing Data Affect the Performance of the SORG Machine-learning Algorithm for Patients With Spinal Metastasis? Development of an Internet Application Algorithm.

Clin Orthop Relat Res. 2024-1-1

[6]
Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.

Cochrane Database Syst Rev. 2022-5-20

[7]
Evaluating gender bias in ML-based clinical risk prediction models: A study on multiple use cases at different hospitals.

J Biomed Inform. 2024-9

[8]
Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.

Cochrane Database Syst Rev. 2021-4-19

[9]
Cost-effectiveness of using prognostic information to select women with breast cancer for adjuvant systemic therapy.

Health Technol Assess. 2006-9

[10]
[Volume and health outcomes: evidence from systematic reviews and from evaluation of Italian hospital data].

Epidemiol Prev. 2013

本文引用的文献

[1]
Navigating the machine learning pipeline: a scoping review of inpatient delirium prediction models.

BMJ Health Care Inform. 2023-7

[2]
There is no such thing as a validated prediction model.

BMC Med. 2023-2-24

[3]
Improving Fairness in the Prediction of Heart Failure Length of Stay and Mortality by Integrating Social Determinants of Health.

Circ Heart Fail. 2022-11

[4]
Factors driving provider adoption of the TREWS machine learning-based early warning system and its effects on sepsis treatment timing.

Nat Med. 2022-7

[5]
Machine Learning-Based Prediction Models for Different Clinical Risks in Different Hospitals: Evaluation of Live Performance.

J Med Internet Res. 2022-6-7

[6]
Underdiagnosis bias of artificial intelligence algorithms applied to chest radiographs in under-served patient populations.

Nat Med. 2021-12

[7]
Quantification of Sepsis Model Alerts in 24 US Hospitals Before and During the COVID-19 Pandemic.

JAMA Netw Open. 2021-11-1

[8]
Sepsis in patients hospitalized with coronavirus disease 2019: how often and how severe?

Curr Opin Crit Care. 2021-10-1

[9]
The Clinician and Dataset Shift in Artificial Intelligence.

N Engl J Med. 2021-7-15

[10]
A scalable approach for developing clinical risk prediction applications in different hospitals.

J Biomed Inform. 2021-6

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索