纳入非结构化临床文本可提高对死亡或 ICU 住院时间延长的早期预测。

Inclusion of Unstructured Clinical Text Improves Early Prediction of Death or Prolonged ICU Stay.

机构信息

Division of Pulmonary, Allergy, and Critical Care, Department of Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA.

Palliative and Advanced Illness Research Center, Department of Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA.

出版信息

Crit Care Med. 2018 Jul;46(7):1125-1132. doi: 10.1097/CCM.0000000000003148.

DOI:10.1097/CCM.0000000000003148

PMID:29629986

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6005735/

Abstract

OBJECTIVES

Early prediction of undesired outcomes among newly hospitalized patients could improve patient triage and prompt conversations about patients' goals of care. We evaluated the performance of logistic regression, gradient boosting machine, random forest, and elastic net regression models, with and without unstructured clinical text data, to predict a binary composite outcome of in-hospital death or ICU length of stay greater than or equal to 7 days using data from the first 48 hours of hospitalization.

DESIGN

Retrospective cohort study with split sampling for model training and testing.

SETTING

A single urban academic hospital.

PATIENTS

All hospitalized patients who required ICU care at the Beth Israel Deaconess Medical Center in Boston, MA, from 2001 to 2012.

INTERVENTIONS

None.

MEASUREMENTS AND MAIN RESULTS

Among eligible 25,947 hospital admissions, we observed 5,504 (21.2%) in which patients died or had ICU length of stay greater than or equal to 7 days. The gradient boosting machine model had the highest discrimination without (area under the receiver operating characteristic curve, 0.83; 95% CI, 0.81-0.84) and with (area under the receiver operating characteristic curve, 0.89; 95% CI, 0.88-0.90) text-derived variables. Both gradient boosting machines and random forests outperformed logistic regression without text data (p < 0.001), whereas all models outperformed logistic regression with text data (p < 0.02). The inclusion of text data increased the discrimination of all four model types (p < 0.001). Among those models using text data, the increasing presence of terms "intubated" and "poor prognosis" were positively associated with mortality and ICU length of stay, whereas the term "extubated" was inversely associated with them.

CONCLUSIONS

Variables extracted from unstructured clinical text from the first 48 hours of hospital admission using natural language processing techniques significantly improved the abilities of logistic regression and other machine learning models to predict which patients died or had long ICU stays. Learning health systems may adapt such models using open-source approaches to capture local variation in care patterns.

摘要

目的

早期预测新住院患者的不良结局可以改善患者分诊，并促使医护人员与患者讨论其医疗照护目标。我们评估了逻辑回归、梯度提升机、随机森林和弹性网络回归模型的性能，这些模型分别使用和不使用非结构化临床文本数据，以预测住院期间死亡或 ICU 住院时间大于或等于 7 天的二元复合结局，数据来源于患者入院后前 48 小时。

设计

回顾性队列研究，采用拆分样本进行模型训练和测试。

地点

一家位于马萨诸塞州波士顿的单一城市学术医院。

患者

所有在马萨诸塞州波士顿贝斯以色列女执事医疗中心需要 ICU 护理的住院患者，纳入时间为 2001 年至 2012 年。

干预措施

无。

测量和主要结果

在 25947 例符合条件的住院患者中，我们观察到 5504 例（21.2%）患者死亡或 ICU 住院时间大于或等于 7 天。梯度提升机模型在不包含（接受者操作特征曲线下面积，0.83；95%置信区间，0.81-0.84）和包含（接受者操作特征曲线下面积，0.89；95%置信区间，0.88-0.90）文本衍生变量的情况下均具有最高的判别能力。梯度提升机和随机森林在不包含文本数据的情况下均优于逻辑回归（p<0.001），而所有模型在包含文本数据的情况下均优于逻辑回归（p<0.02）。纳入文本数据后，所有四种模型类型的判别能力均有所提高（p<0.001）。在使用文本数据的模型中，术语“插管”和“预后不良”的出现频率增加与死亡率和 ICU 住院时间呈正相关，而术语“拔管”与它们呈负相关。

结论

使用自然语言处理技术从患者入院后前 48 小时的非结构化临床文本中提取的变量显著提高了逻辑回归和其他机器学习模型预测患者死亡或 ICU 住院时间延长的能力。学习型医疗系统可以采用这种基于开放源代码的方法来适应模型，以捕捉医疗照护模式的局部差异。

相似文献

Inclusion of Unstructured Clinical Text Improves Early Prediction of Death or Prolonged ICU Stay.纳入非结构化临床文本可提高对死亡或 ICU 住院时间延长的早期预测。

Crit Care Med. 2018 Jul;46(7):1125-1132. doi: 10.1097/CCM.0000000000003148.

Validation of Prediction Models for Critical Care Outcomes Using Natural Language Processing of Electronic Health Record Data.使用电子健康记录数据的自然语言处理验证危重病预后预测模型。

JAMA Netw Open. 2018 Dec 7;1(8):e185097. doi: 10.1001/jamanetworkopen.2018.5097.

Impact of Different Approaches to Preparing Notes for Analysis With Natural Language Processing on the Performance of Prediction Models in Intensive Care.不同的自然语言处理分析笔记准备方法对重症监护预测模型性能的影响

Crit Care Explor. 2021 Jun 11;3(6):e0450. doi: 10.1097/CCE.0000000000000450. eCollection 2021 Jun.

Using nursing notes to improve clinical outcome prediction in intensive care patients: A retrospective cohort study.利用护理记录改善重症监护患者的临床预后预测：一项回顾性队列研究。

J Am Med Inform Assoc. 2021 Jul 30;28(8):1660-1666. doi: 10.1093/jamia/ocab051.

Early hospital mortality prediction of intensive care unit patients using an ensemble learning approach.基于集成学习方法的重症监护病房患者早期住院病死率预测。

Int J Med Inform. 2017 Dec;108:185-195. doi: 10.1016/j.ijmedinf.2017.10.002. Epub 2017 Oct 5.

Hospital Length of Stay and 30-Day Mortality Prediction in Stroke: A Machine Learning Analysis of 17,000 ICU Admissions in Brazil.医院住院时间和 30 天死亡率预测：巴西 17000 例 ICU 入院患者的机器学习分析。

Neurocrit Care. 2022 Aug;37(Suppl 2):313-321. doi: 10.1007/s12028-022-01486-3. Epub 2022 Apr 6.

Multicentre validation of a sepsis prediction algorithm using only vital sign data in the emergency department, general ward and ICU.仅使用生命体征数据在急诊科、普通病房和重症监护病房对脓毒症预测算法进行多中心验证。

BMJ Open. 2018 Jan 26;8(1):e017833. doi: 10.1136/bmjopen-2017-017833.

Prediction of Mortality and Major Adverse Kidney Events in Critically Ill Patients With Acute Kidney Injury.预测重症急性肾损伤患者的死亡率和主要不良肾脏事件。

Am J Kidney Dis. 2023 Jan;81(1):36-47. doi: 10.1053/j.ajkd.2022.06.004. Epub 2022 Jul 19.

Can Predictive Modeling Tools Identify Patients at High Risk of Prolonged Opioid Use After ACL Reconstruction?预测模型工具能否识别 ACL 重建术后阿片类药物使用时间延长的高风险患者？

Clin Orthop Relat Res. 2020 Jul;478(7):0-1618. doi: 10.1097/CORR.0000000000001251.

Mortality and length-of-stay outcomes, 1993-2003, in the binational Australian and New Zealand intensive care adult patient database.1993年至2003年，澳大利亚和新西兰成人重症监护患者双边数据库中的死亡率和住院时间结果。

Crit Care Med. 2008 Jan;36(1):46-61. doi: 10.1097/01.CCM.0000295313.08084.58.

引用本文的文献

Systemic Inflammation Index (SII) as a Predictor of Mortality in Intensive Care Units.全身炎症指数（SII）作为重症监护病房死亡率的预测指标

Biomedicines. 2025 Jul 8;13(7):1669. doi: 10.3390/biomedicines13071669.

An empirical study of using radiology reports and images to improve intensive care unit mortality prediction.一项利用放射学报告和影像改善重症监护病房死亡率预测的实证研究。

JAMIA Open. 2025 Feb 20;8(1):ooae137. doi: 10.1093/jamiaopen/ooae137. eCollection 2025 Feb.

Natural language processing in critical care: opportunities, challenges, and future directions.重症监护中的自然语言处理：机遇、挑战与未来方向。

Intensive Care Med. 2025 Mar;51(3):585-589. doi: 10.1007/s00134-024-07776-y. Epub 2025 Jan 20.

Finding the best trade-off between performance and interpretability in predicting hospital length of stay using structured and unstructured data.利用结构化和非结构化数据预测住院时间，找到性能和可解释性之间的最佳权衡。

PLoS One. 2023 Nov 30;18(11):e0289795. doi: 10.1371/journal.pone.0289795. eCollection 2023.

Machine Learning for Benchmarking Critical Care Outcomes.用于重症监护结果基准测试的机器学习

Healthc Inform Res. 2023 Oct;29(4):301-314. doi: 10.4258/hir.2023.29.4.301. Epub 2023 Oct 31.

A voice-based digital assistant for intelligent prompting of evidence-based practices during ICU rounds.一种基于语音的数字助理，用于在 ICU 查房期间智能提示基于证据的实践。

J Biomed Inform. 2023 Oct;146:104483. doi: 10.1016/j.jbi.2023.104483. Epub 2023 Aug 30.

Medical transformer for multimodal survival prediction in intensive care: integration of imaging and non-imaging data.用于重症监护多模态生存预测的医疗变压器：成像和非成像数据的集成。

Sci Rep. 2023 Jul 1;13(1):10666. doi: 10.1038/s41598-023-37835-1.

Application of machine learning and natural language processing for predicting stroke-associated pneumonia.机器学习和自然语言处理在预测卒中相关性肺炎中的应用。

Front Public Health. 2022 Sep 29;10:1009164. doi: 10.3389/fpubh.2022.1009164. eCollection 2022.

Automated risk assessment of newly detected atrial fibrillation poststroke from electronic health record data using machine learning and natural language processing.利用机器学习和自然语言处理技术，通过电子健康记录数据对中风后新检测出的心房颤动进行自动风险评估。

Front Cardiovasc Med. 2022 Jul 29;9:941237. doi: 10.3389/fcvm.2022.941237. eCollection 2022.

Predicting Postoperative Mortality With Deep Neural Networks and Natural Language Processing: Model Development and Validation.使用深度神经网络和自然语言处理预测术后死亡率：模型开发与验证

JMIR Med Inform. 2022 May 10;10(5):e38241. doi: 10.2196/38241.

本文引用的文献

Measurement Error Due to Patient Flow in Estimates of Intensive Care Unit Length of Stay.重症监护病房住院时长估计中因患者流动导致的测量误差

Am J Epidemiol. 2017 Dec 15;186(12):1389-1395. doi: 10.1093/aje/kwx222.

Discriminative Accuracy of Physician and Nurse Predictions for Survival and Functional Outcomes 6 Months After an ICU Admission.重症监护病房（ICU）入院6个月后医生和护士对生存及功能转归预测的判别准确性

JAMA. 2017 Jun 6;317(21):2187-2195. doi: 10.1001/jama.2017.4078.

Hospital Readmission and Social Risk Factors Identified from Physician Notes.从医生记录中识别出的医院再入院和社会风险因素。

Health Serv Res. 2018 Apr;53(2):1110-1136. doi: 10.1111/1475-6773.12670. Epub 2017 Mar 13.

Are ICU Length of Stay Predictions Worthwhile?重症监护病房住院时间预测是否值得？

Crit Care Med. 2017 Feb;45(2):379-380. doi: 10.1097/CCM.0000000000002111.

Which Models Can I Use to Predict Adult ICU Length of Stay? A Systematic Review.哪些模型可用于预测成人 ICU 住院时间？系统评价。

Crit Care Med. 2017 Feb;45(2):e222-e231. doi: 10.1097/CCM.0000000000002054.

Gaining insights from social media language: Methodologies and challenges.从社交媒体语言中获得洞见：方法与挑战。

Psychol Methods. 2016 Dec;21(4):507-525. doi: 10.1037/met0000091. Epub 2016 Aug 8.

Natural Language Processing to Assess Documentation of Features of Critical Illness in Discharge Documents of Acute Respiratory Distress Syndrome Survivors.利用自然语言处理技术评估急性呼吸窘迫综合征幸存者出院文件中危重症特征的记录情况。

Ann Am Thorac Soc. 2016 Sep;13(9):1538-45. doi: 10.1513/AnnalsATS.201602-131OC.

MIMIC-III, a freely accessible critical care database.MIMIC-III，一个免费获取的重症监护数据库。

Sci Data. 2016 May 24;3:160035. doi: 10.1038/sdata.2016.35.

Length of Hospital Stay Prediction at the Admission Stage for Cardiology Patients Using Artificial Neural Network.基于人工神经网络的心脏病患者入院阶段住院时间预测。

J Healthc Eng. 2016;2016. doi: 10.1155/2016/7035463.

Opportunities and challenges in developing risk prediction models with electronic health records data: a systematic review.利用电子健康记录数据开发风险预测模型的机遇与挑战：一项系统综述

J Am Med Inform Assoc. 2017 Jan;24(1):198-208. doi: 10.1093/jamia/ocw042. Epub 2016 May 17.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验