使用电子健康记录数据的自然语言处理验证危重病预后预测模型。

Validation of Prediction Models for Critical Care Outcomes Using Natural Language Processing of Electronic Health Record Data.

机构信息

Philip R. Lee Institute for Health Policy Studies, School of Medicine, University of California, San Francisco.

Center for Healthcare Value, University of California, San Francisco.

出版信息

JAMA Netw Open. 2018 Dec 7;1(8):e185097. doi: 10.1001/jamanetworkopen.2018.5097.

DOI:10.1001/jamanetworkopen.2018.5097

PMID:30646310

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6324323/

Abstract

IMPORTANCE

Accurate prediction of outcomes among patients in intensive care units (ICUs) is important for clinical research and monitoring care quality. Most existing prediction models do not take full advantage of the electronic health record, using only the single worst value of laboratory tests and vital signs and largely ignoring information present in free-text notes. Whether capturing more of the available data and applying machine learning and natural language processing (NLP) can improve and automate the prediction of outcomes among patients in the ICU remains unknown.

OBJECTIVES

To evaluate the change in power for a mortality prediction model among patients in the ICU achieved by incorporating measures of clinical trajectory together with NLP of clinical text and to assess the generalizability of this approach.

DESIGN, SETTING, AND PARTICIPANTS: This retrospective cohort study included 101 196 patients with a first-time admission to the ICU and a length of stay of at least 4 hours. Twenty ICUs at 2 academic medical centers (University of California, San Francisco [UCSF], and Beth Israel Deaconess Medical Center [BIDMC], Boston, Massachusetts) and 1 community hospital (Mills-Peninsula Medical Center [MPMC], Burlingame, California) contributed data from January 1, 2001, through June 1, 2017. Data were analyzed from July 1, 2017, through August 1, 2018.

MAIN OUTCOMES AND MEASURES

In-hospital mortality and model discrimination as assessed by the area under the receiver operating characteristic curve (AUC) and model calibration as assessed by the modified Hosmer-Lemeshow statistic.

RESULTS

Among 101 196 patients included in the analysis, 51.3% (n = 51 899) were male, with a mean (SD) age of 61.3 (17.1) years; their in-hospital mortality rate was 10.4% (n = 10 505). A baseline model using only the highest and lowest observed values for each laboratory test result or vital sign achieved a cross-validated AUC of 0.831 (95% CI, 0.830-0.832). In contrast, that model augmented with measures of clinical trajectory achieved an AUC of 0.899 (95% CI, 0.896-0.902; P < .001 for AUC difference). Further augmenting this model with NLP-derived terms associated with mortality further increased the AUC to 0.922 (95% CI, 0.916-0.924; P < .001). These NLP-derived terms were associated with improved model performance even when applied across sites (AUC difference for UCSF: 0.077 to 0.021; AUC difference for MPMC: 0.071 to 0.051; AUC difference for BIDMC: 0.035 to 0.043; P < .001) when augmenting with NLP at each site.

CONCLUSIONS AND RELEVANCE

Intensive care unit mortality prediction models incorporating measures of clinical trajectory and NLP-derived terms yielded excellent predictive performance and generalized well in this sample of hospitals. The role of these automated algorithms, particularly those using unstructured data from notes and other sources, in clinical research and quality improvement seems to merit additional investigation.

摘要

重要性

准确预测重症监护病房（ICU）患者的结局对于临床研究和监测护理质量非常重要。大多数现有的预测模型并没有充分利用电子健康记录，仅使用实验室检测和生命体征的单个最差值，并且在很大程度上忽略了自由文本记录中存在的信息。利用更多的可用数据并应用机器学习和自然语言处理（NLP）是否可以改善和自动化 ICU 患者的预后预测结果仍不清楚。

目的

评估通过整合临床轨迹测量值和临床文本的 NLP 来提高 ICU 患者死亡率预测模型的能力，并评估该方法的泛化能力。

设计、设置和参与者：这项回顾性队列研究纳入了 2001 年 1 月 1 日至 2017 年 6 月 1 日期间首次入住 ICU 且入住时间至少 4 小时的 101196 名患者。来自加利福尼亚大学旧金山分校（UCSF）和贝斯以色列女执事医疗中心（BIDMC，马萨诸塞州波士顿）的 20 个 ICU 和加利福尼亚州米尔皮塔斯医疗中心（MPMC，伯林盖姆）提供了数据。数据分析于 2017 年 7 月 1 日至 2018 年 8 月 1 日进行。

主要结局和测量指标

院内死亡率和接受者操作特征曲线（ROC）下面积（AUC）评估的模型区分度，以及改良 Hosmer-Lemeshow 统计量评估的模型校准度。

结果

在纳入分析的 101196 名患者中，51.3%（n=51899）为男性，平均（SD）年龄为 61.3（17.1）岁；他们的院内死亡率为 10.4%（n=10505）。仅使用每个实验室检测结果或生命体征的最高和最低观察值的基线模型，经交叉验证后的 AUC 为 0.831（95%CI，0.830-0.832）。相比之下，通过增加临床轨迹测量值来增强该模型，AUC 可达到 0.899（95%CI，0.896-0.902；P<0.001）。进一步通过 NLP 分析与死亡率相关的术语来增强该模型，可将 AUC 提高至 0.922（95%CI，0.916-0.924；P<0.001）。即使在每个站点应用 NLP 时，这些 NLP 分析的术语也与改善模型性能相关（UCSF 的 AUC 差异：0.077 至 0.021；MPMC 的 AUC 差异：0.071 至 0.051；BIDMC 的 AUC 差异：0.035 至 0.043；P<0.001）。

结论和相关性

纳入临床轨迹和 NLP 分析术语的 ICU 死亡率预测模型表现出出色的预测性能，并且在该组医院中具有良好的泛化能力。这些自动化算法，尤其是那些使用来自记录和其他来源的非结构化数据的算法，在临床研究和质量改进中的作用似乎值得进一步研究。

相似文献

Validation of Prediction Models for Critical Care Outcomes Using Natural Language Processing of Electronic Health Record Data.使用电子健康记录数据的自然语言处理验证危重病预后预测模型。

JAMA Netw Open. 2018 Dec 7;1(8):e185097. doi: 10.1001/jamanetworkopen.2018.5097.

Impact of Different Approaches to Preparing Notes for Analysis With Natural Language Processing on the Performance of Prediction Models in Intensive Care.不同的自然语言处理分析笔记准备方法对重症监护预测模型性能的影响

Crit Care Explor. 2021 Jun 11;3(6):e0450. doi: 10.1097/CCE.0000000000000450. eCollection 2021 Jun.

Can We Geographically Validate a Natural Language Processing Algorithm for Automated Detection of Incidental Durotomy Across Three Independent Cohorts From Two Continents?能否通过来自两大洲的三个独立队列对用于自动检测偶然硬脊膜切开术的自然语言处理算法进行地理验证？

Clin Orthop Relat Res. 2022 Sep 1;480(9):1766-1775. doi: 10.1097/CORR.0000000000002200. Epub 2022 Apr 12.

Multicentre validation of a sepsis prediction algorithm using only vital sign data in the emergency department, general ward and ICU.仅使用生命体征数据在急诊科、普通病房和重症监护病房对脓毒症预测算法进行多中心验证。

BMJ Open. 2018 Jan 26;8(1):e017833. doi: 10.1136/bmjopen-2017-017833.

Comparison of Natural Language Processing of Clinical Notes With a Validated Risk-Stratification Tool to Predict Severe Maternal Morbidity.临床记录的自然语言处理与验证的风险分层工具预测严重产妇发病率的比较。

JAMA Netw Open. 2022 Oct 3;5(10):e2234924. doi: 10.1001/jamanetworkopen.2022.34924.

Measuring Implicit Bias in ICU Notes Using Word-Embedding Neural Network Models.使用词嵌入神经网络模型测量 ICU 记录中的内隐偏见。

Chest. 2024 Jun;165(6):1481-1490. doi: 10.1016/j.chest.2023.12.031. Epub 2024 Jan 8.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区，服用抗叶酸抗疟药物的人群中，叶酸补充剂与疟疾易感性和严重程度的关系。

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

Inclusion of Unstructured Clinical Text Improves Early Prediction of Death or Prolonged ICU Stay.纳入非结构化临床文本可提高对死亡或 ICU 住院时间延长的早期预测。

Crit Care Med. 2018 Jul;46(7):1125-1132. doi: 10.1097/CCM.0000000000003148.

Using nursing notes to improve clinical outcome prediction in intensive care patients: A retrospective cohort study.利用护理记录改善重症监护患者的临床预后预测：一项回顾性队列研究。

J Am Med Inform Assoc. 2021 Jul 30;28(8):1660-1666. doi: 10.1093/jamia/ocab051.

Community-wide assessment of intensive care outcomes using a physiologically based prognostic measure: implications for critical care delivery from Cleveland Health Quality Choice.使用基于生理学的预后指标对重症监护结果进行全社区评估：克利夫兰健康质量选择对重症监护服务的影响

Chest. 1999 Mar;115(3):793-801. doi: 10.1378/chest.115.3.793.

引用本文的文献

Improved prediction and flagging of extreme random effects for non-Gaussian outcomes using weighted methods.使用加权方法改进对非高斯结果的极端随机效应的预测和标记。

Biometrics. 2025 Jul 3;81(3). doi: 10.1093/biomtc/ujaf094.

Clinical applications of large language models in medicine and surgery: A scoping review.大型语言模型在医学与外科中的临床应用：一项范围综述

J Int Med Res. 2025 Jul;53(7):3000605251347556. doi: 10.1177/03000605251347556. Epub 2025 Jul 4.

Explainable AI for Clinical Outcome Prediction: A Survey of Clinician Perceptions and Preferences.用于临床结果预测的可解释人工智能：临床医生认知与偏好调查

AMIA Jt Summits Transl Sci Proc. 2025 Jun 10;2025:215-224. eCollection 2025.

Qualitative changes in clinical records after implementation of pharmacist-led antimicrobial stewardship program: a text mining analysis.实施由药剂师主导的抗菌药物管理计划后临床记录的定性变化：一项文本挖掘分析

J Pharm Health Care Sci. 2025 Apr 23;11(1):34. doi: 10.1186/s40780-025-00439-0.

Physician documentation matters. Using natural language processing to predict mortality in sepsis.医生的记录很重要。利用自然语言处理预测脓毒症死亡率。

Intell Based Med. 2021;5. doi: 10.1016/j.ibmed.2021.100028. Epub 2021 Mar 10.

A systematic review of natural language processing applications in Trauma & Orthopaedics.创伤与矫形外科学中自然语言处理应用的系统评价。

Bone Jt Open. 2025 Mar 5;6(3):264-274. doi: 10.1302/2633-1462.63.BJO-2024-0081.R1.

Using Structured Codes and Free-Text Notes to Measure Information Complementarity in Electronic Health Records: Feasibility and Validation Study.使用结构化编码和自由文本注释来衡量电子健康记录中的信息互补性：可行性与验证研究。

J Med Internet Res. 2025 Feb 13;27:e66910. doi: 10.2196/66910.

Classifying Unstructured Text in Electronic Health Records for Mental Health Prediction Models: Large Language Model Evaluation Study.用于心理健康预测模型的电子健康记录中非结构化文本分类：大语言模型评估研究

JMIR Med Inform. 2025 Jan 21;13:e65454. doi: 10.2196/65454.

Comparison of six natural language processing approaches to assessing firearm access in Veterans Health Administration electronic health records.六种自然语言处理方法在退伍军人健康管理局电子健康记录中评估枪支获取情况的比较。

J Am Med Inform Assoc. 2025 Jan 1;32(1):113-118. doi: 10.1093/jamia/ocae169.

The Growing Impact of Natural Language Processing in Healthcare and Public Health.自然语言处理在医疗保健和公共卫生领域的影响日益扩大。

Inquiry. 2024 Jan-Dec;61:469580241290095. doi: 10.1177/00469580241290095.

本文引用的文献

Scalable and accurate deep learning with electronic health records.借助电子健康记录实现可扩展且准确的深度学习。

NPJ Digit Med. 2018 May 8;1:18. doi: 10.1038/s41746-018-0029-1. eCollection 2018.

Inclusion of Unstructured Clinical Text Improves Early Prediction of Death or Prolonged ICU Stay.纳入非结构化临床文本可提高对死亡或 ICU 住院时间延长的早期预测。

Crit Care Med. 2018 Jul;46(7):1125-1132. doi: 10.1097/CCM.0000000000003148.

Evaluation of ICU Risk Models Adapted for Use as Continuous Markers of Severity of Illness Throughout the ICU Stay.评估 ICU 风险模型，以适应在 ICU 住院期间作为疾病严重程度的连续标志物使用。

Crit Care Med. 2018 Mar;46(3):361-367. doi: 10.1097/CCM.0000000000002904.

Development and Evaluation of an Automated Machine Learning Algorithm for In-Hospital Mortality Risk Adjustment Among Critical Care Patients.开发和评估一种自动化机器学习算法，用于重症监护患者住院死亡率的风险调整。

Crit Care Med. 2018 Jun;46(6):e481-e488. doi: 10.1097/CCM.0000000000003011.

Discriminative Accuracy of Physician and Nurse Predictions for Survival and Functional Outcomes 6 Months After an ICU Admission.重症监护病房（ICU）入院6个月后医生和护士对生存及功能转归预测的判别准确性

JAMA. 2017 Jun 6;317(21):2187-2195. doi: 10.1001/jama.2017.4078.

MIMIC-III, a freely accessible critical care database.MIMIC-III，一个免费获取的重症监护数据库。

Sci Data. 2016 May 24;3:160035. doi: 10.1038/sdata.2016.35.

The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets.在不平衡数据集上评估二元分类器时，精确率-召回率曲线比ROC曲线更具信息性。

PLoS One. 2015 Mar 4;10(3):e0118432. doi: 10.1371/journal.pone.0118432. eCollection 2015.

Efficient and sparse feature selection for biomedical text classification via the elastic net: Application to ICU risk stratification from nursing notes.通过弹性网络进行生物医学文本分类的高效稀疏特征选择：在根据护理记录进行重症监护病房风险分层中的应用

J Biomed Inform. 2015 Apr;54:114-20. doi: 10.1016/j.jbi.2015.02.003. Epub 2015 Feb 17.

Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD): the TRIPOD statement.透明报告个体预后或诊断的多变量预测模型（TRIPOD）：TRIPOD 声明。

Ann Intern Med. 2015 Jan 6;162(1):55-63. doi: 10.7326/M14-0697.

N-gram support vector machines for scalable procedure and diagnosis classification, with applications to clinical free text data from the intensive care unit.N-gram 支持向量机在可扩展过程和诊断分类中的应用，应用于重症监护病房的临床自由文本数据。

J Am Med Inform Assoc. 2014 Sep-Oct;21(5):871-5. doi: 10.1136/amiajnl-2014-002694. Epub 2014 Apr 30.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验