基于机器学习的临床风险预测模型的纵向模型转变：不同医院多个用例的评估研究

Longitudinal Model Shifts of Machine Learning-Based Clinical Risk Prediction Models: Evaluation Study of Multiple Use Cases Across Different Hospitals.

作者信息

Cabanillas Silva Patricia, Sun Hong, Rezk Mohamed, Roccaro-Waldmeyer Diana M, Fliegenschmidt Janis, Hulde Nikolai, von Dossow Vera, Meesseman Laurent, Depraetere Kristof, Stieg Joerg, Szymanowsky Ralph, Dahlweid Fried-Michael

机构信息

Dedalus HealthCare, Antwerp, Belgium.

Provincial Key Laboratory of Multimodal Perceiving and Intelligent Systems, Jiaxing University, Jiaxing, China.

出版信息

J Med Internet Res. 2024 Dec 13;26:e51409. doi: 10.2196/51409.

DOI:10.2196/51409

PMID:39671571

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11681292/

Abstract

BACKGROUND

In recent years, machine learning (ML)-based models have been widely used in clinical domains to predict clinical risk events. However, in production, the performances of such models heavily rely on changes in the system and data. The dynamic nature of the system environment, characterized by continuous changes, has significant implications for prediction models, leading to performance degradation and reduced clinical efficacy. Thus, monitoring model shifts and evaluating their impact on prediction models are of utmost importance.

OBJECTIVE

This study aimed to assess the impact of a model shift on ML-based prediction models by evaluating 3 different use cases-delirium, sepsis, and acute kidney injury (AKI)-from 2 hospitals (M and H) with different patient populations and investigate potential model deterioration during the COVID-19 pandemic period.

METHODS

We trained prediction models using retrospective data from earlier years and examined the presence of a model shift using data from more recent years. We used the area under the receiver operating characteristic curve (AUROC) to evaluate model performance and analyzed the calibration curves over time. We also assessed the influence on clinical decisions by evaluating the alert rate, the rates of over- and underdiagnosis, and the decision curve.

RESULTS

The 2 data sets used in this study contained 189,775 and 180,976 medical cases for hospitals M and H, respectively. Statistical analyses (Z test) revealed no significant difference (P>.05) between the AUROCs from the different years for all use cases and hospitals. For example, in hospital M, AKI did not show a significant difference between 2020 (AUROC=0.898) and 2021 (AUROC=0.907, Z=-1.171, P=.242). Similar results were observed in both hospitals and for all use cases (sepsis and delirium) when comparing all the different years. However, when evaluating the calibration curves at the 2 hospitals, model shifts were observed for the delirium and sepsis use cases but not for AKI. Additionally, to investigate the clinical utility of our models, we performed decision curve analysis (DCA) and compared the results across the different years. A pairwise nonparametric statistical comparison showed no differences in the net benefit at the probability thresholds of interest (P>.05). The comprehensive evaluations performed in this study ensured robust model performance of all the investigated models across the years. Moreover, neither performance deteriorations nor alert surges were observed during the COVID-19 pandemic period.

CONCLUSIONS

Clinical risk prediction models were affected by the dynamic and continuous evolution of clinical practices and workflows. The performance of the models evaluated in this study appeared stable when assessed using AUROCs, showing no significant variations over the years. Additional model shift investigations suggested that a calibration shift was present for certain use cases (delirium and sepsis). However, these changes did not have any impact on the clinical utility of the models based on DCA. Consequently, it is crucial to closely monitor data changes and detect possible model shifts, along with their potential influence on clinical decision-making.

摘要

背景

近年来，基于机器学习（ML）的模型已广泛应用于临床领域以预测临床风险事件。然而，在实际应用中，此类模型的性能严重依赖于系统和数据的变化。以持续变化为特征的系统环境的动态性质对预测模型具有重大影响，会导致性能下降和临床疗效降低。因此，监测模型变化并评估其对预测模型的影响至关重要。

目的

本研究旨在通过评估来自两家具有不同患者群体的医院（M医院和H医院）的3种不同用例——谵妄、脓毒症和急性肾损伤（AKI），来评估模型变化对基于ML的预测模型的影响，并调查COVID-19大流行期间潜在的模型恶化情况。

方法

我们使用早年的回顾性数据训练预测模型，并使用近年的数据检查模型变化的存在情况。我们使用受试者操作特征曲线下面积（AUROC）来评估模型性能，并分析随时间变化的校准曲线。我们还通过评估警报率、过度诊断和漏诊率以及决策曲线来评估对临床决策的影响。

结果

本研究中使用的两个数据集分别包含M医院和H医院的189,775例和180,976例医疗病例。统计分析（Z检验）显示，所有用例和医院不同年份的AUROC之间无显著差异（P>0.05）。例如，在M医院，AKI在2020年（AUROC = 0.898）和2021年（AUROC = 0.907，Z = -1.171，P = 0.242）之间未显示出显著差异。在比较所有不同年份时，两家医院的所有用例（脓毒症和谵妄）均观察到类似结果。然而，在评估两家医院的校准曲线时，观察到谵妄和脓毒症用例存在模型变化，而AKI则没有。此外，为了研究我们模型的临床效用，我们进行了决策曲线分析（DCA）并比较了不同年份的结果。成对非参数统计比较显示，在感兴趣的概率阈值下净效益无差异（P>0.05）。本研究中进行的综合评估确保了所有被调查模型多年来的稳健模型性能。此外，在COVID-19大流行期间未观察到性能下降或警报激增。

结论

临床风险预测模型受到临床实践和工作流程的动态持续演变的影响。当使用AUROC评估时，本研究中评估的模型性能似乎稳定，多年来未显示出显著变化。额外的模型变化调查表明，某些用例（谵妄和脓毒症）存在校准变化。然而，这些变化对基于DCA的模型的临床效用没有任何影响。因此，密切监测数据变化并检测可能的模型变化及其对临床决策的潜在影响至关重要。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc32/11681292/4a7b23523933/jmir_v26i1e51409_fig1.jpg

相似文献

Longitudinal Model Shifts of Machine Learning-Based Clinical Risk Prediction Models: Evaluation Study of Multiple Use Cases Across Different Hospitals.基于机器学习的临床风险预测模型的纵向模型转变：不同医院多个用例的评估研究

J Med Internet Res. 2024 Dec 13;26:e51409. doi: 10.2196/51409.

Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.两种现代生存预测工具 SORG-MLA 和 METSSS 在接受手术联合放疗和单纯放疗治疗有症状长骨转移患者中的比较。

Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.

Prescription of Controlled Substances: Benefits and Risks管制药品的处方：益处与风险

Are Current Survival Prediction Tools Useful When Treating Subsequent Skeletal-related Events From Bone Metastases?当前的生存预测工具在治疗骨转移后的骨骼相关事件时有用吗？

Clin Orthop Relat Res. 2024 Sep 1;482(9):1710-1721. doi: 10.1097/CORR.0000000000003030. Epub 2024 Mar 22.

Does the Presence of Missing Data Affect the Performance of the SORG Machine-learning Algorithm for Patients With Spinal Metastasis? Development of an Internet Application Algorithm.缺失数据的存在是否会影响 SORG 机器学习算法在脊柱转移瘤患者中的性能？开发一种互联网应用算法。

Clin Orthop Relat Res. 2024 Jan 1;482(1):143-157. doi: 10.1097/CORR.0000000000002706. Epub 2023 Jun 12.

Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中，如果患者出现以下症状和体征，可判断其是否患有 COVID-19。

Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.

Evaluating gender bias in ML-based clinical risk prediction models: A study on multiple use cases at different hospitals.评估基于机器学习的临床风险预测模型中的性别偏见：在不同医院的多个用例上的研究。

J Biomed Inform. 2024 Sep;157:104692. doi: 10.1016/j.jbi.2024.104692. Epub 2024 Jul 14.

Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.系统性药理学治疗慢性斑块状银屑病：网络荟萃分析。

Cochrane Database Syst Rev. 2021 Apr 19;4(4):CD011535. doi: 10.1002/14651858.CD011535.pub4.

Cost-effectiveness of using prognostic information to select women with breast cancer for adjuvant systemic therapy.利用预后信息为乳腺癌患者选择辅助性全身治疗的成本效益

Health Technol Assess. 2006 Sep;10(34):iii-iv, ix-xi, 1-204. doi: 10.3310/hta10340.

[Volume and health outcomes: evidence from systematic reviews and from evaluation of Italian hospital data].[容量与健康结果：来自系统评价和意大利医院数据评估的证据]

Epidemiol Prev. 2013 Mar-Jun;37(2-3 Suppl 2):1-100.

本文引用的文献

Navigating the machine learning pipeline: a scoping review of inpatient delirium prediction models.探索机器学习管道：住院谵妄预测模型的范围综述。

BMJ Health Care Inform. 2023 Jul;30(1). doi: 10.1136/bmjhci-2023-100767.

There is no such thing as a validated prediction model.没有经过验证的预测模型这种东西。

BMC Med. 2023 Feb 24;21(1):70. doi: 10.1186/s12916-023-02779-w.

Improving Fairness in the Prediction of Heart Failure Length of Stay and Mortality by Integrating Social Determinants of Health.通过整合健康社会决定因素来提高心力衰竭住院时间和死亡率预测的公平性。

Circ Heart Fail. 2022 Nov;15(11):e009473. doi: 10.1161/CIRCHEARTFAILURE.122.009473. Epub 2022 Nov 15.

Factors driving provider adoption of the TREWS machine learning-based early warning system and its effects on sepsis treatment timing.推动供应商采用 TREWS 基于机器学习的早期预警系统的因素及其对脓毒症治疗时机的影响。

Nat Med. 2022 Jul;28(7):1447-1454. doi: 10.1038/s41591-022-01895-z. Epub 2022 Jul 21.

Machine Learning-Based Prediction Models for Different Clinical Risks in Different Hospitals: Evaluation of Live Performance.基于机器学习的不同医院不同临床风险预测模型：现场性能评估。

J Med Internet Res. 2022 Jun 7;24(6):e34295. doi: 10.2196/34295.

Underdiagnosis bias of artificial intelligence algorithms applied to chest radiographs in under-served patient populations.人工智能算法应用于服务不足患者人群的胸部 X 光片时的漏诊偏倚。

Nat Med. 2021 Dec;27(12):2176-2182. doi: 10.1038/s41591-021-01595-0. Epub 2021 Dec 10.

Quantification of Sepsis Model Alerts in 24 US Hospitals Before and During the COVID-19 Pandemic.24 家美国医院在 COVID-19 大流行前后对脓毒症模型警报的量化。

JAMA Netw Open. 2021 Nov 1;4(11):e2135286. doi: 10.1001/jamanetworkopen.2021.35286.

Sepsis in patients hospitalized with coronavirus disease 2019: how often and how severe?新冠肺炎住院患者中的脓毒症：有多常见？有多严重？

Curr Opin Crit Care. 2021 Oct 1;27(5):474-479. doi: 10.1097/MCC.0000000000000861.

The Clinician and Dataset Shift in Artificial Intelligence.临床医生与人工智能中的数据集偏移

N Engl J Med. 2021 Jul 15;385(3):283-286. doi: 10.1056/NEJMc2104626.

A scalable approach for developing clinical risk prediction applications in different hospitals.一种可扩展的方法，用于在不同医院开发临床风险预测应用程序。

J Biomed Inform. 2021 Jun;118:103783. doi: 10.1016/j.jbi.2021.103783. Epub 2021 Apr 20.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于机器学习的临床风险预测模型的纵向模型转变：不同医院多个用例的评估研究

Longitudinal Model Shifts of Machine Learning-Based Clinical Risk Prediction Models: Evaluation Study of Multiple Use Cases Across Different Hospitals.

作者信息

机构信息

出版信息

BACKGROUND

OBJECTIVE

METHODS

RESULTS

CONCLUSIONS

背景

目的

方法

结果

结论

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献