COVID-19住院患者处置情况分类：使用自然语言处理技术阅读出院小结

Classification of the Disposition of Patients Hospitalized with COVID-19: Reading Discharge Summaries Using Natural Language Processing.

作者信息

Fernandes Marta, Sun Haoqi, Jain Aayushee, Alabsi Haitham S, Brenner Laura N, Ye Elissa, Ge Wendong, Collens Sarah I, Leone Michael J, Das Sudeshna, Robbins Gregory K, Mukerji Shibani S, Westover M Brandon

机构信息

Department of Neurology, Massachusetts General Hospital, Boston, MA, United States.

Clinical Data Animation Center, Boston, MA, United States.

出版信息

JMIR Med Inform. 2021 Feb 10;9(2):e25457. doi: 10.2196/25457.

DOI:10.2196/25457

PMID:33449908

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7879729/

Abstract

BACKGROUND

Medical notes are a rich source of patient data; however, the nature of unstructured text has largely precluded the use of these data for large retrospective analyses. Transforming clinical text into structured data can enable large-scale research studies with electronic health records (EHR) data. Natural language processing (NLP) can be used for text information retrieval, reducing the need for labor-intensive chart review. Here we present an application of NLP to large-scale analysis of medical records at 2 large hospitals for patients hospitalized with COVID-19.

OBJECTIVE

Our study goal was to develop an NLP pipeline to classify the discharge disposition (home, inpatient rehabilitation, skilled nursing inpatient facility [SNIF], and death) of patients hospitalized with COVID-19 based on hospital discharge summary notes.

METHODS

Text mining and feature engineering were applied to unstructured text from hospital discharge summaries. The study included patients with COVID-19 discharged from 2 hospitals in the Boston, Massachusetts area (Massachusetts General Hospital and Brigham and Women's Hospital) between March 10, 2020, and June 30, 2020. The data were divided into a training set (70%) and hold-out test set (30%). Discharge summaries were represented as bags-of-words consisting of single words (unigrams), bigrams, and trigrams. The number of features was reduced during training by excluding n-grams that occurred in fewer than 10% of discharge summaries, and further reduced using least absolute shrinkage and selection operator (LASSO) regularization while training a multiclass logistic regression model. Model performance was evaluated using the hold-out test set.

RESULTS

The study cohort included 1737 adult patients (median age 61 [SD 18] years; 55% men; 45% White and 16% Black; 14% nonsurvivors and 61% discharged home). The model selected 179 from a vocabulary of 1056 engineered features, consisting of combinations of unigrams, bigrams, and trigrams. The top features contributing most to the classification by the model (for each outcome) were the following: "appointments specialty," "home health," and "home care" (home); "intubate" and "ARDS" (inpatient rehabilitation); "service" (SNIF); "brief assessment" and "covid" (death). The model achieved a micro-average area under the receiver operating characteristic curve value of 0.98 (95% CI 0.97-0.98) and average precision of 0.81 (95% CI 0.75-0.84) in the testing set for prediction of discharge disposition.

CONCLUSIONS

A supervised learning-based NLP approach is able to classify the discharge disposition of patients hospitalized with COVID-19. This approach has the potential to accelerate and increase the scale of research on patients' discharge disposition that is possible with EHR data.

摘要

背景

医疗记录是患者数据的丰富来源；然而，非结构化文本的性质在很大程度上阻碍了这些数据用于大型回顾性分析。将临床文本转化为结构化数据能够利用电子健康记录（EHR）数据开展大规模研究。自然语言处理（NLP）可用于文本信息检索，减少劳动密集型的病历审查需求。在此，我们展示了NLP在两家大型医院对新冠肺炎住院患者的大规模病历分析中的应用。

目的

我们的研究目标是开发一个NLP管道，根据医院出院小结对新冠肺炎住院患者的出院处置情况（回家、住院康复、专业护理住院机构[SNIF]和死亡）进行分类。

方法

对医院出院小结中的非结构化文本应用文本挖掘和特征工程。该研究纳入了2020年3月10日至2020年6月30日期间从马萨诸塞州波士顿地区的两家医院（马萨诸塞州总医院和布莱根妇女医院）出院的新冠肺炎患者。数据被分为训练集（70%）和保留测试集（30%）。出院小结以由单字（一元组）、双字组和三字组组成的词袋表示。在训练过程中，通过排除出现频率低于10%的出院小结中的n元语法来减少特征数量，并在训练多类逻辑回归模型时使用最小绝对收缩和选择算子（LASSO）正则化进一步减少特征数量。使用保留测试集评估模型性能。

结果

研究队列包括1737名成年患者（中位年龄61[标准差18]岁；55%为男性；45%为白人，16%为黑人；14%为非幸存者，61%出院回家）。该模型从1056个工程特征的词汇表中选择了179个，这些特征由一元组、双字组和三字组的组合构成。对模型分类贡献最大的前几个特征（针对每个结果）如下：“预约专科”、“家庭健康”和“家庭护理”（回家）；“插管”和“急性呼吸窘迫综合征”（住院康复）；“服务”（SNIF）；“简要评估”和“新冠”（死亡）。在测试集中，该模型预测出院处置情况的受试者工作特征曲线下的微平均面积值为0.98（95%置信区间0.97 - 0.98），平均精度为0.81（95%置信区间0.75 - 0.84）。

结论

基于监督学习的NLP方法能够对新冠肺炎住院患者的出院处置情况进行分类。这种方法有可能加快并扩大利用EHR数据对患者出院处置情况的研究规模。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/035b/7879729/2d12c7792b7c/medinform_v9i2e25457_fig1.jpg

相似文献

Classification of the Disposition of Patients Hospitalized with COVID-19: Reading Discharge Summaries Using Natural Language Processing.

JMIR Med Inform. 2021 Feb 10;9(2):e25457. doi: 10.2196/25457.

Comparing Natural Language Processing and Structured Medical Data to Develop a Computable Phenotype for Patients Hospitalized Due to COVID-19: Retrospective Analysis.

JMIR Med Inform. 2023 Aug 22;11:e46267. doi: 10.2196/46267.

Classification of neurologic outcomes from medical notes using natural language processing.

Expert Syst Appl. 2023 Mar 15;214. doi: 10.1016/j.eswa.2022.119171. Epub 2022 Nov 6.

Artificial Intelligence Learning Semantics via External Resources for Classifying Diagnosis Codes in Discharge Notes.

J Med Internet Res. 2017 Nov 6;19(11):e380. doi: 10.2196/jmir.8344.

Automated Extraction of Stroke Severity from Unstructured Electronic Health Records using Natural Language Processing.

medRxiv. 2024 Mar 11:2024.03.08.24304011. doi: 10.1101/2024.03.08.24304011.

Comparison of Natural Language Processing of Clinical Notes With a Validated Risk-Stratification Tool to Predict Severe Maternal Morbidity.

JAMA Netw Open. 2022 Oct 3;5(10):e2234924. doi: 10.1001/jamanetworkopen.2022.34924.

Validation of Prediction Models for Critical Care Outcomes Using Natural Language Processing of Electronic Health Record Data.

JAMA Netw Open. 2018 Dec 7;1(8):e185097. doi: 10.1001/jamanetworkopen.2018.5097.

Classifying Characteristics of Opioid Use Disorder From Hospital Discharge Summaries Using Natural Language Processing.

Front Public Health. 2022 May 9;10:850619. doi: 10.3389/fpubh.2022.850619. eCollection 2022.

Developing a FHIR-based EHR phenotyping framework: A case study for identification of patients with obesity and multiple comorbidities from discharge summaries.

J Biomed Inform. 2019 Nov;99:103310. doi: 10.1016/j.jbi.2019.103310. Epub 2019 Oct 14.

Enhancing readmission prediction models by integrating insights from home healthcare notes: Retrospective cohort study.

Int J Nurs Stud. 2024 Oct;158:104850. doi: 10.1016/j.ijnurstu.2024.104850. Epub 2024 Jul 3.

引用本文的文献

A Machine Learning Approach for Identifying People With Neuroinfectious Diseases in Electronic Health Records: Algorithm Development and Validation.

JMIR Med Inform. 2025 Aug 29;13:e63157. doi: 10.2196/63157.

ARDSFlag: an NLP/machine learning algorithm to visualize and detect high-probability ARDS admissions independent of provider recognition and billing codes.

BMC Med Inform Decis Mak. 2024 Jul 16;24(1):195. doi: 10.1186/s12911-024-02573-5.

COVID-19 advising application development for Apple devices (iOS).

PeerJ Comput Sci. 2023 Mar 13;9:e1274. doi: 10.7717/peerj-cs.1274. eCollection 2023.

Leveraging natural language processing and geospatial time series model to analyze COVID-19 vaccination sentiment dynamics on Tweets.

JAMIA Open. 2023 Apr 12;6(2):ooad023. doi: 10.1093/jamiaopen/ooad023. eCollection 2023 Jul.

Classification of neurologic outcomes from medical notes using natural language processing.

Expert Syst Appl. 2023 Mar 15;214. doi: 10.1016/j.eswa.2022.119171. Epub 2022 Nov 6.

The Role of Natural Language Processing during the COVID-19 Pandemic: Health Applications, Opportunities, and Challenges.

Healthcare (Basel). 2022 Nov 12;10(11):2270. doi: 10.3390/healthcare10112270.

The Value of Extracting Clinician-Recorded Affect for Advancing Clinical Research on Depression: Proof-of-Concept Study Applying Natural Language Processing to Electronic Health Records.

JMIR Form Res. 2022 May 12;6(5):e34436. doi: 10.2196/34436.

Stroke Outcome Measurements From Electronic Medical Records: Cross-sectional Study on the Effectiveness of Neural and Nonneural Classifiers.

JMIR Med Inform. 2021 Nov 1;9(11):e29120. doi: 10.2196/29120.

本文引用的文献

Presenting Characteristics, Comorbidities, and Outcomes Among 5700 Patients Hospitalized With COVID-19 in the New York City Area.

JAMA. 2020 May 26;323(20):2052-2059. doi: 10.1001/jama.2020.6775.

Temporal dynamics in viral shedding and transmissibility of COVID-19.

Nat Med. 2020 May;26(5):672-675. doi: 10.1038/s41591-020-0869-5. Epub 2020 Apr 15.

Severe Outcomes Among Patients with Coronavirus Disease 2019 (COVID-19) - United States, February 12-March 16, 2020.

MMWR Morb Mortal Wkly Rep. 2020 Mar 27;69(12):343-346. doi: 10.15585/mmwr.mm6912e2.

Clinical course and risk factors for mortality of adult inpatients with COVID-19 in Wuhan, China: a retrospective cohort study.

Lancet. 2020 Mar 28;395(10229):1054-1062. doi: 10.1016/S0140-6736(20)30566-3. Epub 2020 Mar 11.

The Incubation Period of Coronavirus Disease 2019 (COVID-19) From Publicly Reported Confirmed Cases: Estimation and Application.

Ann Intern Med. 2020 May 5;172(9):577-582. doi: 10.7326/M20-0504. Epub 2020 Mar 10.

Covid-19 - Navigating the Uncharted.

N Engl J Med. 2020 Mar 26;382(13):1268-1269. doi: 10.1056/NEJMe2002387. Epub 2020 Feb 28.

Characteristics of and Important Lessons From the Coronavirus Disease 2019 (COVID-19) Outbreak in China: Summary of a Report of 72 314 Cases From the Chinese Center for Disease Control and Prevention.

JAMA. 2020 Apr 7;323(13):1239-1242. doi: 10.1001/jama.2020.2648.

Clinical Characteristics of 138 Hospitalized Patients With 2019 Novel Coronavirus-Infected Pneumonia in Wuhan, China.

JAMA. 2020 Mar 17;323(11):1061-1069. doi: 10.1001/jama.2020.1585.

First Case of 2019 Novel Coronavirus in the United States.

N Engl J Med. 2020 Mar 5;382(10):929-936. doi: 10.1056/NEJMoa2001191. Epub 2020 Jan 31.

Validation of Prediction Models for Critical Care Outcomes Using Natural Language Processing of Electronic Health Record Data.

JAMA Netw Open. 2018 Dec 7;1(8):e185097. doi: 10.1001/jamanetworkopen.2018.5097.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

COVID-19住院患者处置情况分类：使用自然语言处理技术阅读出院小结

Classification of the Disposition of Patients Hospitalized with COVID-19: Reading Discharge Summaries Using Natural Language Processing.

作者信息

机构信息

出版信息

BACKGROUND

OBJECTIVE

METHODS

RESULTS

CONCLUSIONS

背景

目的

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献