• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于机器学习的不同医院不同临床风险预测模型:现场性能评估。

Machine Learning-Based Prediction Models for Different Clinical Risks in Different Hospitals: Evaluation of Live Performance.

机构信息

Dedalus Healthcare, Antwerp, Belgium.

Institute of Anesthesiology and Pain Therapy, Heart and Diabetes Centre North Rhine-Westphalia, University Hospital of Ruhr-University Bochum, Bad Oeynhausen, Germany.

出版信息

J Med Internet Res. 2022 Jun 7;24(6):e34295. doi: 10.2196/34295.

DOI:10.2196/34295
PMID:35502887
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9214618/
Abstract

BACKGROUND

Machine learning algorithms are currently used in a wide array of clinical domains to produce models that can predict clinical risk events. Most models are developed and evaluated with retrospective data, very few are evaluated in a clinical workflow, and even fewer report performances in different hospitals. In this study, we provide detailed evaluations of clinical risk prediction models in live clinical workflows for three different use cases in three different hospitals.

OBJECTIVE

The main objective of this study was to evaluate clinical risk prediction models in live clinical workflows and compare their performance in these setting with their performance when using retrospective data. We also aimed at generalizing the results by applying our investigation to three different use cases in three different hospitals.

METHODS

We trained clinical risk prediction models for three use cases (ie, delirium, sepsis, and acute kidney injury) in three different hospitals with retrospective data. We used machine learning and, specifically, deep learning to train models that were based on the Transformer model. The models were trained using a calibration tool that is common for all hospitals and use cases. The models had a common design but were calibrated using each hospital's specific data. The models were deployed in these three hospitals and used in daily clinical practice. The predictions made by these models were logged and correlated with the diagnosis at discharge. We compared their performance with evaluations on retrospective data and conducted cross-hospital evaluations.

RESULTS

The performance of the prediction models with data from live clinical workflows was similar to the performance with retrospective data. The average value of the area under the receiver operating characteristic curve (AUROC) decreased slightly by 0.6 percentage points (from 94.8% to 94.2% at discharge). The cross-hospital evaluations exhibited severely reduced performance: the average AUROC decreased by 8 percentage points (from 94.2% to 86.3% at discharge), which indicates the importance of model calibration with data from the deployment hospital.

CONCLUSIONS

Calibrating the prediction model with data from different deployment hospitals led to good performance in live settings. The performance degradation in the cross-hospital evaluation identified limitations in developing a generic model for different hospitals. Designing a generic process for model development to generate specialized prediction models for each hospital guarantees model performance in different hospitals.

摘要

背景

机器学习算法目前被广泛应用于各个临床领域,以构建能够预测临床风险事件的模型。大多数模型都是基于回顾性数据开发和评估的,只有极少数模型在临床工作流程中进行了评估,而更少的模型报告了在不同医院的表现。在这项研究中,我们针对三个不同医院的三个不同用例,在实际临床工作流程中对临床风险预测模型进行了详细评估。

目的

本研究的主要目的是评估实际临床工作流程中的临床风险预测模型,并比较其在这些环境中的表现与使用回顾性数据时的表现。我们还旨在通过将我们的研究应用于三个不同医院的三个不同用例来推广结果。

方法

我们使用回顾性数据在三个不同医院为三个用例(即谵妄、脓毒症和急性肾损伤)训练临床风险预测模型。我们使用机器学习,特别是基于 Transformer 模型的深度学习来训练模型。这些模型是使用所有医院和用例都通用的校准工具进行训练的。这些模型具有相同的设计,但使用每个医院的特定数据进行校准。这些模型在这三个医院中进行了部署,并在日常临床实践中使用。这些模型的预测结果被记录并与出院时的诊断相关联。我们将它们在实际临床工作流程中的表现与在回顾性数据上的评估进行了比较,并进行了跨医院评估。

结果

实际临床工作流程中数据的预测模型的性能与回顾性数据的性能相似。接收者操作特征曲线下面积(AUROC)的平均值略有下降,为 0.6 个百分点(从出院时的 94.8%降至 94.2%)。跨医院评估显示性能严重下降,AUROC 平均下降 8 个百分点(从出院时的 94.2%降至 86.3%),这表明使用部署医院的数据对模型进行校准的重要性。

结论

使用来自不同部署医院的数据校准预测模型可以在实际环境中实现良好的性能。跨医院评估中的性能下降表明,为不同医院开发通用模型存在局限性。设计一个通用的模型开发流程,为每个医院生成专门的预测模型,可确保在不同医院的模型性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c382/9214618/3069e8d2ef0c/jmir_v24i6e34295_fig8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c382/9214618/7b5530c3a28c/jmir_v24i6e34295_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c382/9214618/355e3b23ca37/jmir_v24i6e34295_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c382/9214618/979b88ead6f9/jmir_v24i6e34295_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c382/9214618/e57dee181dba/jmir_v24i6e34295_fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c382/9214618/a7504281f39f/jmir_v24i6e34295_fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c382/9214618/c7646a9b917b/jmir_v24i6e34295_fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c382/9214618/7607efa466be/jmir_v24i6e34295_fig7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c382/9214618/3069e8d2ef0c/jmir_v24i6e34295_fig8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c382/9214618/7b5530c3a28c/jmir_v24i6e34295_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c382/9214618/355e3b23ca37/jmir_v24i6e34295_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c382/9214618/979b88ead6f9/jmir_v24i6e34295_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c382/9214618/e57dee181dba/jmir_v24i6e34295_fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c382/9214618/a7504281f39f/jmir_v24i6e34295_fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c382/9214618/c7646a9b917b/jmir_v24i6e34295_fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c382/9214618/7607efa466be/jmir_v24i6e34295_fig7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c382/9214618/3069e8d2ef0c/jmir_v24i6e34295_fig8.jpg

相似文献

1
Machine Learning-Based Prediction Models for Different Clinical Risks in Different Hospitals: Evaluation of Live Performance.基于机器学习的不同医院不同临床风险预测模型:现场性能评估。
J Med Internet Res. 2022 Jun 7;24(6):e34295. doi: 10.2196/34295.
2
A scalable approach for developing clinical risk prediction applications in different hospitals.一种可扩展的方法,用于在不同医院开发临床风险预测应用程序。
J Biomed Inform. 2021 Jun;118:103783. doi: 10.1016/j.jbi.2021.103783. Epub 2021 Apr 20.
3
Risk prediction of delirium in hospitalized patients using machine learning: An implementation and prospective evaluation study.使用机器学习预测住院患者谵妄的风险:一项实施和前瞻性评估研究。
J Am Med Inform Assoc. 2020 Jul 1;27(9):1383-1392. doi: 10.1093/jamia/ocaa113.
4
Performance of a Machine Learning Algorithm Using Electronic Health Record Data to Predict Postoperative Complications and Report on a Mobile Platform.基于电子健康记录数据的机器学习算法预测术后并发症的性能及移动平台报告。
JAMA Netw Open. 2022 May 2;5(5):e2211973. doi: 10.1001/jamanetworkopen.2022.11973.
5
Dynamic and explainable machine learning prediction of mortality in patients in the intensive care unit: a retrospective study of high-frequency data in electronic patient records.动态可解释机器学习预测 ICU 患者死亡率:电子患者记录中高频数据的回顾性研究。
Lancet Digit Health. 2020 Apr;2(4):e179-e191. doi: 10.1016/S2589-7500(20)30018-2. Epub 2020 Mar 12.
6
Postoperative delirium prediction using machine learning models and preoperative electronic health record data.基于机器学习模型和术前电子健康记录数据预测术后谵妄。
BMC Anesthesiol. 2022 Jan 3;22(1):8. doi: 10.1186/s12871-021-01543-y.
7
Assessing the effects of data drift on the performance of machine learning models used in clinical sepsis prediction.评估数据漂移对临床脓毒症预测中使用的机器学习模型性能的影响。
Int J Med Inform. 2023 May;173:104930. doi: 10.1016/j.ijmedinf.2022.104930. Epub 2022 Nov 19.
8
Predicting post-stroke pneumonia using deep neural network approaches.使用深度神经网络方法预测卒中后肺炎。
Int J Med Inform. 2019 Dec;132:103986. doi: 10.1016/j.ijmedinf.2019.103986. Epub 2019 Oct 1.
9
Prediction of In-hospital Mortality in Emergency Department Patients With Sepsis: A Local Big Data-Driven, Machine Learning Approach.急诊科脓毒症患者院内死亡率的预测:一种基于本地大数据驱动的机器学习方法。
Acad Emerg Med. 2016 Mar;23(3):269-78. doi: 10.1111/acem.12876. Epub 2016 Feb 13.
10
Development and Validation of Unplanned Extubation Prediction Models Using Intensive Care Unit Data: Retrospective, Comparative, Machine Learning Study.基于 ICU 数据的非计划性拔管预测模型的开发和验证:回顾性、对比、机器学习研究。
J Med Internet Res. 2021 Aug 11;23(8):e23508. doi: 10.2196/23508.

引用本文的文献

1
Diagnostic systematic review and meta-analysis of machine learning in predicting biochemical recurrence of prostate cancer.机器学习在预测前列腺癌生化复发中的诊断性系统评价与荟萃分析
Sci Rep. 2025 Aug 4;15(1):28378. doi: 10.1038/s41598-025-11445-5.
2
Machine Learning Multimodal Model for Delirium Risk Stratification.用于谵妄风险分层的机器学习多模态模型
JAMA Netw Open. 2025 May 1;8(5):e258874. doi: 10.1001/jamanetworkopen.2025.8874.
3
A Simple, Interpretable Machine Learning Model Based on Clinical Factors Accurately Predicts Incident Dysplasia or Malignancy in Barrett's Esophagus.

本文引用的文献

1
Artificial intelligence predicts delirium following cardiac surgery: A case study.人工智能预测心脏手术后谵妄:病例研究。
J Clin Anesth. 2021 Dec;75:110473. doi: 10.1016/j.jclinane.2021.110473. Epub 2021 Jul 29.
2
Real-world validation of artificial intelligence algorithms for ophthalmic imaging.眼科成像人工智能算法的真实世界验证
Lancet Digit Health. 2021 Aug;3(8):e463-e464. doi: 10.1016/S2589-7500(21)00140-0.
3
External Validation of a Widely Implemented Proprietary Sepsis Prediction Model in Hospitalized Patients.
一种基于临床因素的简单、可解释的机器学习模型能够准确预测巴雷特食管的异型增生或恶性病变。
Dig Dis Sci. 2025 Apr 28. doi: 10.1007/s10620-025-09069-w.
4
Developing a Machine Learning Model for Predicting 30-Day Major Adverse Cardiac and Cerebrovascular Events in Patients Undergoing Noncardiac Surgery: Retrospective Study.开发用于预测非心脏手术患者30天主要不良心脑血管事件的机器学习模型:回顾性研究
J Med Internet Res. 2025 Apr 9;27:e66366. doi: 10.2196/66366.
5
Machine Learning to Assist in Managing Acute Kidney Injury in General Wards: Multicenter Retrospective Study.机器学习辅助综合病房急性肾损伤管理:多中心回顾性研究
J Med Internet Res. 2025 Mar 18;27:e66568. doi: 10.2196/66568.
6
An Interoperable Machine Learning Pipeline for Pediatric Obesity Risk Estimation.用于儿科肥胖风险评估的可互操作机器学习管道
Proc Mach Learn Res. 2024 Dec;259:308-324.
7
Leveraging artificial intelligence for the management of postoperative delirium following cardiac surgery.利用人工智能管理心脏手术后的谵妄。
Eur J Anaesthesiol Intensive Care. 2022 Dec 8;2(1):e0010. doi: 10.1097/EA9.0000000000000010. eCollection 2023 Feb.
8
Assessing Hwa-byung Vulnerability Using the Hwa-byung Personality Scale: a comparative study of machine learning approaches.使用火病个性量表评估火病易感性:机器学习方法的比较研究
J Pharmacopuncture. 2024 Dec 31;27(4):358-366. doi: 10.3831/KPI.2024.27.4.358.
9
Longitudinal Model Shifts of Machine Learning-Based Clinical Risk Prediction Models: Evaluation Study of Multiple Use Cases Across Different Hospitals.基于机器学习的临床风险预测模型的纵向模型转变:不同医院多个用例的评估研究
J Med Internet Res. 2024 Dec 13;26:e51409. doi: 10.2196/51409.
10
Machine learning-based delirium prediction in surgical in-patients: a prospective validation study.基于机器学习的外科住院患者谵妄预测:一项前瞻性验证研究。
JAMIA Open. 2024 Sep 17;7(3):ooae091. doi: 10.1093/jamiaopen/ooae091. eCollection 2024 Oct.
在住院患者中验证广泛实施的专有脓毒症预测模型的外部有效性。
JAMA Intern Med. 2021 Aug 1;181(8):1065-1070. doi: 10.1001/jamainternmed.2021.2626.
4
Minimum sample size for external validation of a clinical prediction model with a binary outcome.具有二元结局的临床预测模型外部验证的最小样本量
Stat Med. 2021 Aug 30;40(19):4230-4251. doi: 10.1002/sim.9025. Epub 2021 May 24.
5
A scalable approach for developing clinical risk prediction applications in different hospitals.一种可扩展的方法,用于在不同医院开发临床风险预测应用程序。
J Biomed Inform. 2021 Jun;118:103783. doi: 10.1016/j.jbi.2021.103783. Epub 2021 Apr 20.
6
How medical AI devices are evaluated: limitations and recommendations from an analysis of FDA approvals.医学人工智能设备的评估方式:基于对美国食品药品监督管理局批准情况分析的局限性与建议
Nat Med. 2021 Apr;27(4):582-584. doi: 10.1038/s41591-021-01312-x.
7
Use of LOINC for interoperability between organisations poses a risk to safety.在组织间使用LOINC进行互操作性存在安全风险。
Lancet Digit Health. 2020 Nov;2(11):e569. doi: 10.1016/S2589-7500(20)30244-2. Epub 2020 Oct 19.
8
Risk prediction of delirium in hospitalized patients using machine learning: An implementation and prospective evaluation study.使用机器学习预测住院患者谵妄的风险:一项实施和前瞻性评估研究。
J Am Med Inform Assoc. 2020 Jul 1;27(9):1383-1392. doi: 10.1093/jamia/ocaa113.
9
Machine intelligence in healthcare-perspectives on trustworthiness, explainability, usability, and transparency.医疗保健中的机器智能——关于可信度、可解释性、可用性和透明度的观点
NPJ Digit Med. 2020 Mar 26;3:47. doi: 10.1038/s41746-020-0254-2. eCollection 2020.
10
A simple, step-by-step guide to interpreting decision curve analysis.解读决策曲线分析的简易分步指南。
Diagn Progn Res. 2019 Oct 4;3:18. doi: 10.1186/s41512-019-0064-7. eCollection 2019.