• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用电子健康记录数据提高机器学习模型可转移性的标准词汇表:使用医疗相关感染的回顾性队列研究

Standard Vocabularies to Improve Machine Learning Model Transferability With Electronic Health Record Data: Retrospective Cohort Study Using Health Care-Associated Infection.

作者信息

Kiser Amber C, Eilbeck Karen, Ferraro Jeffrey P, Skarda David E, Samore Matthew H, Bucher Brian

机构信息

Department of Biomedical Informatics, School of Medicine, University of Utah, Salt Lake City, UT, United States.

Department of Medicine, School of Medicine, University of Utah, Salt Lake City, UT, United States.

出版信息

JMIR Med Inform. 2022 Aug 30;10(8):e39057. doi: 10.2196/39057.

DOI:10.2196/39057
PMID:36040784
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9472055/
Abstract

BACKGROUND

With the widespread adoption of electronic healthcare records (EHRs) by US hospitals, there is an opportunity to leverage this data for the development of predictive algorithms to improve clinical care. A key barrier in model development and implementation includes the external validation of model discrimination, which is rare and often results in worse performance. One reason why machine learning models are not externally generalizable is data heterogeneity. A potential solution to address the substantial data heterogeneity between health care systems is to use standard vocabularies to map EHR data elements. The advantage of these vocabularies is a hierarchical relationship between elements, which allows the aggregation of specific clinical features to more general grouped concepts.

OBJECTIVE

This study aimed to evaluate grouping EHR data using standard vocabularies to improve the transferability of machine learning models for the detection of postoperative health care-associated infections across institutions with different EHR systems.

METHODS

Patients who underwent surgery from the University of Utah Health and Intermountain Healthcare from July 2014 to August 2017 with complete follow-up data were included. The primary outcome was a health care-associated infection within 30 days of the procedure. EHR data from 0-30 days after the operation were mapped to standard vocabularies and grouped using the hierarchical relationships of the vocabularies. Model performance was measured using the area under the receiver operating characteristic curve (AUC) and F-score in internal and external validations. To evaluate model transferability, a difference-in-difference metric was defined as the difference in performance drop between internal and external validations for the baseline and grouped models.

RESULTS

A total of 5775 patients from the University of Utah and 15,434 patients from Intermountain Healthcare were included. The prevalence of selected outcomes was from 4.9% (761/15,434) to 5% (291/5775) for surgical site infections, from 0.8% (44/5775) to 1.1% (171/15,434) for pneumonia, from 2.6% (400/15,434) to 3% (175/5775) for sepsis, and from 0.8% (125/15,434) to 0.9% (50/5775) for urinary tract infections. In all outcomes, the grouping of data using standard vocabularies resulted in a reduced drop in AUC and F-score in external validation compared to baseline features (all P<.001, except urinary tract infection AUC: P=.002). The difference-in-difference metrics ranged from 0.005 to 0.248 for AUC and from 0.075 to 0.216 for F-score.

CONCLUSIONS

We demonstrated that grouping machine learning model features based on standard vocabularies improved model transferability between data sets across 2 institutions. Improving model transferability using standard vocabularies has the potential to improve the generalization of clinical prediction models across the health care system.

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6baa/9472055/5eff9923cbda/medinform_v10i8e39057_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6baa/9472055/bb96cf455a83/medinform_v10i8e39057_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6baa/9472055/fa6e8a96acb5/medinform_v10i8e39057_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6baa/9472055/5eff9923cbda/medinform_v10i8e39057_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6baa/9472055/bb96cf455a83/medinform_v10i8e39057_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6baa/9472055/fa6e8a96acb5/medinform_v10i8e39057_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6baa/9472055/5eff9923cbda/medinform_v10i8e39057_fig3.jpg
摘要

背景

随着美国医院广泛采用电子健康记录(EHRs),利用这些数据开发预测算法以改善临床护理成为可能。模型开发和实施中的一个关键障碍包括模型区分度的外部验证,这种验证很少见,且往往导致性能更差。机器学习模型无法进行外部泛化的一个原因是数据异质性。解决医疗系统间大量数据异质性的一个潜在解决方案是使用标准词汇表来映射EHR数据元素。这些词汇表的优势在于元素之间的层次关系,这使得特定临床特征能够聚合为更通用的分组概念。

目的

本研究旨在评估使用标准词汇表对EHR数据进行分组,以提高机器学习模型在不同EHR系统的机构间检测术后医疗相关感染的可转移性。

方法

纳入2014年7月至2017年8月在犹他大学健康中心和山间医疗保健机构接受手术且有完整随访数据的患者。主要结局是术后30天内发生的医疗相关感染。术后0至30天的EHR数据被映射到标准词汇表,并利用词汇表的层次关系进行分组。在内部和外部验证中,使用受试者操作特征曲线下面积(AUC)和F值来衡量模型性能。为评估模型的可转移性,定义了一个差异度量指标,即基线模型和分组模型在内部和外部验证中性能下降的差异。

结果

共纳入来自犹他大学的5775例患者和来自山间医疗保健机构的15434例患者。选定结局的患病率如下:手术部位感染为4.9%(761/15434)至5%(291/5775),肺炎为0.8%(44/5775)至1.1%(171/15434),脓毒症为2.6%(400/15434)至3%(175/5775),尿路感染为0.8%(125/15434)至0.9%(50/5775)。在所有结局中,与基线特征相比,使用标准词汇表对数据进行分组导致外部验证中AUC和F值的下降减少(除尿路感染AUC:P = 0.002外,所有P < 0.001)。AUC的差异度量指标范围为0.005至0.248,F值的差异度量指标范围为0.075至0.216。

结论

我们证明了基于标准词汇表对机器学习模型特征进行分组可提高两个机构数据集之间的模型可转移性。使用标准词汇表提高模型可转移性有可能改善临床预测模型在整个医疗系统中的泛化能力。

相似文献

1
Standard Vocabularies to Improve Machine Learning Model Transferability With Electronic Health Record Data: Retrospective Cohort Study Using Health Care-Associated Infection.利用电子健康记录数据提高机器学习模型可转移性的标准词汇表:使用医疗相关感染的回顾性队列研究
JMIR Med Inform. 2022 Aug 30;10(8):e39057. doi: 10.2196/39057.
2
Recurrent neural network models (CovRNN) for predicting outcomes of patients with COVID-19 on admission to hospital: model development and validation using electronic health record data.用于预测COVID-19患者入院时预后的循环神经网络模型(CovRNN):使用电子健康记录数据进行模型开发和验证
Lancet Digit Health. 2022 Jun;4(6):e415-e425. doi: 10.1016/S2589-7500(22)00049-8. Epub 2022 Apr 21.
3
Preoperative Prediction of Postoperative Infections Using Machine Learning and Electronic Health Record Data.利用机器学习和电子健康记录数据进行术后感染的术前预测。
Ann Surg. 2024 Apr 1;279(4):720-726. doi: 10.1097/SLA.0000000000006106. Epub 2023 Sep 27.
4
Accurate Prediction of Coronary Heart Disease for Patients With Hypertension From Electronic Health Records With Big Data and Machine-Learning Methods: Model Development and Performance Evaluation.利用大数据和机器学习方法从电子健康记录中准确预测高血压患者的冠心病:模型开发与性能评估
JMIR Med Inform. 2020 Jul 6;8(7):e17257. doi: 10.2196/17257.
5
Postoperative delirium prediction using machine learning models and preoperative electronic health record data.基于机器学习模型和术前电子健康记录数据预测术后谵妄。
BMC Anesthesiol. 2022 Jan 3;22(1):8. doi: 10.1186/s12871-021-01543-y.
6
The Development and Validation of Simplified Machine Learning Algorithms to Predict Prognosis of Hospitalized Patients With COVID-19: Multicenter, Retrospective Study.中文译文:简化机器学习算法预测 COVID-19 住院患者预后的开发和验证:多中心回顾性研究。
J Med Internet Res. 2022 Jan 21;24(1):e31549. doi: 10.2196/31549.
7
Transatlantic transferability and replicability of machine-learning algorithms to predict mental health crises.用于预测心理健康危机的机器学习算法的跨大西洋可转移性和可复制性。
NPJ Digit Med. 2024 Sep 9;7(1):227. doi: 10.1038/s41746-024-01203-8.
8
Developing the Total Health Profile, a Generalizable Unified Set of Multimorbidity Risk Scores Derived From Machine Learning for Broad Patient Populations: Retrospective Cohort Study.开发全面健康状况评分,这是一套可推广的、基于机器学习的多病症风险评分体系,适用于广泛的患者群体:回顾性队列研究。
J Med Internet Res. 2021 Nov 26;23(11):e32900. doi: 10.2196/32900.
9
Tensor learning of pointwise mutual information from EHR data for early prediction of sepsis.基于电子健康记录数据的点互信息张量学习用于脓毒症的早期预测。
Comput Biol Med. 2021 Jul;134:104430. doi: 10.1016/j.compbiomed.2021.104430. Epub 2021 May 7.
10
Performance of a Machine Learning Algorithm Using Electronic Health Record Data to Identify and Estimate Survival in a Longitudinal Cohort of Patients With Lung Cancer.基于电子健康记录数据的机器学习算法在肺癌纵向队列患者中识别和估计生存的性能。
JAMA Netw Open. 2021 Jul 1;4(7):e2114723. doi: 10.1001/jamanetworkopen.2021.14723.

引用本文的文献

1
Cross-biobank generalizability and accuracy of electronic health record-based predictors compared to polygenic scores.与多基因评分相比,基于电子健康记录的预测指标在跨生物样本库中的通用性和准确性。
Nat Genet. 2025 Aug 27. doi: 10.1038/s41588-025-02298-9.
2
Improving Clinical Documentation with Artificial Intelligence: A Systematic Review.利用人工智能改善临床文档记录:一项系统综述。
Perspect Health Inf Manag. 2024 Jun 1;21(2):1d. eCollection 2024 Summer-Fall.
3
An explainable long short-term memory network for surgical site infection identification.

本文引用的文献

1
Postoperative complications: an observational study of trends in the United States from 2012 to 2018.术后并发症:2012 年至 2018 年美国的趋势观察研究。
BMC Surg. 2021 Nov 6;21(1):393. doi: 10.1186/s12893-021-01392-z.
2
Applying Machine Learning Across Sites: External Validation of a Surgical Site Infection Detection Algorithm.跨站点应用机器学习:手术部位感染检测算法的外部验证。
J Am Coll Surg. 2021 Jun;232(6):963-971.e1. doi: 10.1016/j.jamcollsurg.2021.03.026. Epub 2021 Apr 5.
3
Using machine learning to improve the accuracy of patient deterioration predictions: Mayo Clinic Early Warning Score (MC-EWS).
用于手术部位感染识别的可解释长短时记忆网络。
Surgery. 2024 Jul;176(1):24-31. doi: 10.1016/j.surg.2024.03.006. Epub 2024 Apr 18.
4
Developing an LSTM Model to Identify Surgical Site Infections using Electronic Healthcare Records.使用电子健康记录开发长短期记忆模型以识别手术部位感染
AMIA Jt Summits Transl Sci Proc. 2023 Jun 16;2023:330-339. eCollection 2023.
利用机器学习提高患者恶化预测的准确性:梅奥诊所早期预警评分(MC-EWS)。
J Am Med Inform Assoc. 2021 Jun 12;28(6):1207-1215. doi: 10.1093/jamia/ocaa347.
4
External validation of prognostic models: what, why, how, when and where?预后模型的外部验证:是什么、为什么、如何、何时以及何地?
Clin Kidney J. 2020 Nov 24;14(1):49-58. doi: 10.1093/ckj/sfaa188. eCollection 2021 Jan.
5
Prediction of Major Depressive Disorder Following Beta-Blocker Therapy in Patients with Cardiovascular Diseases.心血管疾病患者接受β受体阻滞剂治疗后发生重度抑郁症的预测
J Pers Med. 2020 Dec 18;10(4):288. doi: 10.3390/jpm10040288.
6
Identification of important factors in an inpatient fall risk prediction model to improve the quality of care using EHR and electronic administrative data: A machine-learning approach.利用电子病历和电子行政数据,通过机器学习方法识别住院患者跌倒风险预测模型中的重要因素,以提高护理质量。
Int J Med Inform. 2020 Nov;143:104272. doi: 10.1016/j.ijmedinf.2020.104272. Epub 2020 Sep 15.
7
Representation of EHR data for predictive modeling: a comparison between UMLS and other terminologies.用于预测建模的电子健康记录数据表示:UMLS 与其他术语集的比较。
J Am Med Inform Assoc. 2020 Oct 1;27(10):1593-1599. doi: 10.1093/jamia/ocaa180.
8
Artificial Intelligence-Based Multimodal Risk Assessment Model for Surgical Site Infection (AMRAMS): Development and Validation Study.基于人工智能的手术部位感染多模态风险评估模型(AMRAMS):开发与验证研究
JMIR Med Inform. 2020 Jun 15;8(6):e18186. doi: 10.2196/18186.
9
Establishment and evaluation of a multicenter collaborative prediction model construction framework supporting model generalization and continuous improvement: A pilot study.建立和评估一个支持模型推广和持续改进的多中心协作预测模型构建框架:一项试点研究。
Int J Med Inform. 2020 Sep;141:104173. doi: 10.1016/j.ijmedinf.2020.104173. Epub 2020 May 30.
10
Development and validation of phenotype classifiers across multiple sites in the observational health data sciences and informatics network.在观察性健康数据科学和信息学网络的多个站点开发和验证表型分类器。
J Am Med Inform Assoc. 2020 Jun 1;27(6):877-883. doi: 10.1093/jamia/ocaa032.