使用自然语言处理技术从非结构化电子健康记录中自动提取中风严重程度

Automated Extraction of Stroke Severity from Unstructured Electronic Health Records using Natural Language Processing.

作者信息

Fernandes Marta, Westover M Brandon, Singhal Aneesh B, Zafar Sahar F

机构信息

Department of Neurology, Massachusetts General Hospital (MGH), Boston, Massachusetts, United States.

Department of Neurology, Beth Israel Deaconess Medical Center (BIDMC), Boston, Massachusetts, United States.

出版信息

medRxiv. 2024 Mar 11:2024.03.08.24304011. doi: 10.1101/2024.03.08.24304011.

DOI:10.1101/2024.03.08.24304011

PMID:38559062

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10980121/

Abstract

BACKGROUND

Multi-center electronic health records (EHR) can support quality improvement initiatives and comparative effectiveness research in stroke care. However, limitations of EHR-based research include challenges in abstracting key clinical variables from non-structured data at scale. This is further compounded by missing data. Here we develop a natural language processing (NLP) model that automatically reads EHR notes to determine the NIH stroke scale (NIHSS) score of patients with acute stroke.

METHODS

The study included notes from acute stroke patients (>= 18 years) admitted to the Massachusetts General Hospital (MGH) (2015-2022). The MGH data were divided into training (70%) and hold-out test (30%) sets. A two-stage model was developed to predict the admission NIHSS. A linear model with the least absolute shrinkage and selection operator (LASSO) was trained within the training set. For notes in the test set where the NIHSS was documented, the scores were extracted using regular expressions (stage 1), for notes where NIHSS was not documented, LASSO was used for prediction (stage 2). The reference standard for NIHSS was obtained from Get With The Guidelines Stroke Registry. The two-stage model was tested on the hold-out test set and validated in the MIMIC-III dataset (Medical Information Mart for Intensive Care-MIMIC III 2001-2012) v1.4, using root mean squared error (RMSE) and Spearman correlation (SC).

RESULTS

We included 4,163 patients (MGH = 3,876; MIMIC = 287); average age of 69 [SD 15] years; 53% male, and 72% white. 90% patients had ischemic stroke and 10% hemorrhagic stroke. The two-stage model achieved a RMSE [95% CI] of 3.13 [2.86-3.41] (SC = 0.90 [0.88-0. 91]) in the MGH hold-out test set and 2.01 [1.58-2.38] (SC = 0.96 [0.94-0.97]) in the MIMIC validation set.

CONCLUSIONS

The automatic NLP-based model can enable large-scale stroke severity phenotyping from EHR and therefore support real-world quality improvement and comparative effectiveness studies in stroke.

摘要

背景

多中心电子健康记录（EHR）可支持卒中护理的质量改进计划和比较效果研究。然而，基于EHR的研究存在局限性，包括大规模从非结构化数据中提取关键临床变量面临挑战。数据缺失使这一问题更加复杂。在此，我们开发了一种自然语言处理（NLP）模型，该模型可自动读取EHR记录以确定急性卒中患者的美国国立卫生研究院卒中量表（NIHSS）评分。

方法

该研究纳入了麻省总医院（MGH）（2015 - 2022年）收治的急性卒中患者（≥18岁）的记录。MGH数据被分为训练集（70%）和保留测试集（30%）。开发了一个两阶段模型来预测入院时的NIHSS。在训练集内训练了一个带有最小绝对收缩和选择算子（LASSO）的线性模型。对于测试集中记录了NIHSS的记录，使用正则表达式提取评分（第一阶段）；对于未记录NIHSS的记录，使用LASSO进行预测（第二阶段）。NIHSS的参考标准来自“遵循指南卒中登记册”。在保留测试集上对两阶段模型进行测试，并在MIMIC - III数据集（重症监护医学信息集市 - MIMIC III 2001 - 2012年）v1.4中进行验证，使用均方根误差（RMSE）和斯皮尔曼相关性（SC）。

结果

我们纳入了4163例患者（MGH = 3876例；MIMIC = 287例）；平均年龄69岁[标准差15岁]；53%为男性，72%为白人。90%的患者为缺血性卒中，10%为出血性卒中。两阶段模型在MGH保留测试集中的RMSE[95%置信区间]为3.13[2.86 - 3.41]（SC = = 0.90[0.88 - 0.91]），在MIMIC验证集中为2.01[1.58 - 2.38]（SC = 0.96[0.94 - 0.97]）。

结论

基于NLP的自动模型能够从EHR中实现大规模卒中严重程度表型分析，从而支持卒中领域的实际质量改进和比较效果研究。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/61d6/10980121/5889211fbb35/nihpp-2024.03.08.24304011v1-f0001.jpg

相似文献

Automated Extraction of Stroke Severity from Unstructured Electronic Health Records using Natural Language Processing.使用自然语言处理技术从非结构化电子健康记录中自动提取中风严重程度

medRxiv. 2024 Mar 11:2024.03.08.24304011. doi: 10.1101/2024.03.08.24304011.

Automated Extraction of Stroke Severity From Unstructured Electronic Health Records Using Natural Language Processing.使用自然语言处理从非结构化电子健康记录中自动提取中风严重程度

J Am Heart Assoc. 2024 Nov 5;13(21):e036386. doi: 10.1161/JAHA.124.036386. Epub 2024 Oct 25.

Assessing stroke severity using electronic health record data: a machine learning approach.利用电子健康记录数据评估中风严重程度：一种机器学习方法。

BMC Med Inform Decis Mak. 2020 Jan 8;20(1):8. doi: 10.1186/s12911-019-1010-x.

Classification of the Disposition of Patients Hospitalized with COVID-19: Reading Discharge Summaries Using Natural Language Processing.COVID-19住院患者处置情况分类：使用自然语言处理技术阅读出院小结

JMIR Med Inform. 2021 Feb 10;9(2):e25457. doi: 10.2196/25457.

Automatic quantitative stroke severity assessment based on Chinese clinical named entity recognition with domain-adaptive pre-trained large language model.基于具有领域自适应预训练的大型语言模型的中文临床命名实体识别的自动定量卒中严重程度评估。

Artif Intell Med. 2024 Apr;150:102822. doi: 10.1016/j.artmed.2024.102822. Epub 2024 Feb 27.

Classifying social determinants of health from unstructured electronic health records using deep learning-based natural language processing.利用基于深度学习的自然语言处理技术从非结构化电子健康记录中分类社会健康决定因素。

J Biomed Inform. 2022 Mar;127:103984. doi: 10.1016/j.jbi.2021.103984. Epub 2022 Jan 7.

Comparing Natural Language Processing and Structured Medical Data to Develop a Computable Phenotype for Patients Hospitalized Due to COVID-19: Retrospective Analysis.比较自然语言处理和结构化医学数据以开发COVID-19住院患者的可计算表型：回顾性分析

JMIR Med Inform. 2023 Aug 22;11:e46267. doi: 10.2196/46267.

Development of a Natural Language Processing (NLP) model to automatically extract clinical data from electronic health records: results from an Italian comprehensive stroke center.开发一种自然语言处理 (NLP) 模型，以自动从电子健康记录中提取临床数据：来自意大利综合卒中中心的结果。

Int J Med Inform. 2024 Dec;192:105626. doi: 10.1016/j.ijmedinf.2024.105626. Epub 2024 Sep 19.

Developing a FHIR-based EHR phenotyping framework: A case study for identification of patients with obesity and multiple comorbidities from discharge summaries.基于 FHIR 的电子健康记录表型框架的开发：以从出院小结中识别肥胖且伴有多种合并症的患者为例。

J Biomed Inform. 2019 Nov;99:103310. doi: 10.1016/j.jbi.2019.103310. Epub 2019 Oct 14.

Identifying stroke-related quantified evidence from electronic health records in real-world studies.从真实世界研究的电子健康记录中识别与中风相关的定量证据。

Artif Intell Med. 2023 Jun;140:102552. doi: 10.1016/j.artmed.2023.102552. Epub 2023 Apr 23.

本文引用的文献

Optimum Baseline Clinical Severity Scale Cut Points for Prognosticating Intracerebral Hemorrhage: INTERACT Studies.最佳基线临床严重程度评分截断点预测脑出血预后：INTERACT 研究。

Stroke. 2024 Jan;55(1):139-145. doi: 10.1161/STROKEAHA.123.044538. Epub 2023 Nov 29.

From Admission to Discharge: Predicting National Institutes of Health Stroke Scale Progression in Stroke Patients Using Biomarkers and Explainable Machine Learning.从入院到出院：使用生物标志物和可解释机器学习预测中风患者的美国国立卫生研究院卒中量表进展情况

J Pers Med. 2023 Sep 14;13(9):1375. doi: 10.3390/jpm13091375.

Identifying stroke-related quantified evidence from electronic health records in real-world studies.从真实世界研究的电子健康记录中识别与中风相关的定量证据。

Artif Intell Med. 2023 Jun;140:102552. doi: 10.1016/j.artmed.2023.102552. Epub 2023 Apr 23.

Insights into measuring health disparities using electronic health records from a statewide network of health systems: A case study.利用全州卫生系统网络中的电子健康记录衡量健康差异的见解：一项案例研究。

J Clin Transl Sci. 2023 Feb 1;7(1):e54. doi: 10.1017/cts.2022.521. eCollection 2023.

Predicting the Severity of Neurological Impairment Caused by Ischemic Stroke Using Deep Learning Based on Diffusion-Weighted Images.基于扩散加权图像，利用深度学习预测缺血性中风所致神经功能缺损的严重程度。

J Clin Med. 2022 Jul 11;11(14):4008. doi: 10.3390/jcm11144008.

Value of the Electronic Medical Record for Hospital Care: Update From the Literature.电子病历在医院护理中的价值：文献综述更新

J Med Internet Res. 2021 Dec 23;23(12):e26323. doi: 10.2196/26323.

Improving Prehospital Stroke Diagnosis Using Natural Language Processing of Paramedic Reports.利用急救员报告的自然语言处理提高院前卒中诊断

Stroke. 2021 Aug;52(8):2676-2679. doi: 10.1161/STROKEAHA.120.033580. Epub 2021 Jun 24.

Automated Electronic Phenotyping of Cardioembolic Stroke.自动化电子心源性卒中表型分析。

Stroke. 2021 Jan;52(1):181-189. doi: 10.1161/STROKEAHA.120.030663. Epub 2020 Dec 10.

Assessing stroke severity using electronic health record data: a machine learning approach.利用电子健康记录数据评估中风严重程度：一种机器学习方法。

BMC Med Inform Decis Mak. 2020 Jan 8;20(1):8. doi: 10.1186/s12911-019-1010-x.

The role of medical data in efficient patient care delivery: a review.医学数据在高效患者护理提供中的作用：综述

Risk Manag Healthc Policy. 2019 Apr 24;12:67-73. doi: 10.2147/RMHP.S179259. eCollection 2019.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

使用自然语言处理技术从非结构化电子健康记录中自动提取中风严重程度

Automated Extraction of Stroke Severity from Unstructured Electronic Health Records using Natural Language Processing.

作者信息

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献