Suppr超能文献

使用自然语言处理从非结构化电子健康记录中自动提取中风严重程度

Automated Extraction of Stroke Severity From Unstructured Electronic Health Records Using Natural Language Processing.

机构信息

Department of Neurology Massachusetts General Hospital (MGH) Boston MA.

Department of Neurology Beth Israel Deaconess Medical Center (BIDMC) Boston MA.

出版信息

J Am Heart Assoc. 2024 Nov 5;13(21):e036386. doi: 10.1161/JAHA.124.036386. Epub 2024 Oct 25.

Abstract

BACKGROUND

Multicenter electronic health records can support quality improvement and comparative effectiveness research in stroke. However, limitations of electronic health record-based research include challenges in abstracting key clinical variables, including stroke severity, along with missing data. We developed a natural language processing model that reads electronic health record notes to directly extract the National Institutes of Health Stroke Scale score when documented and predict the score from clinical documentation when missing.

METHODS AND RESULTS

The study included notes from patients with acute stroke (aged ≥18 years) admitted to Massachusetts General Hospital (2015-2022). The Massachusetts General Hospital data were divided into training/holdout test (70%/30%) sets. We developed a 2-stage model to predict the admission National Institutes of Health Stroke Scale, obtained from the GWTG (Get With The Guidelines) stroke registry. We trained a model with the least absolute shrinkage and selection operator. For test notes with documented National Institutes of Health Stroke Scale, scores were extracted using regular expressions (stage 1); when not documented, least absolute shrinkage and selection operator was used for prediction (stage 2). The 2-stage model was tested on the holdout test set and validated in the Medical Information Mart for Intensive Care (2001-2012) version 1.4, using root mean squared error and Spearman correlation. We included 4163 patients (Massachusetts General Hospital, 3876; Medical Information Mart for Intensive Care, 287); average age, 69 (SD, 15) years; 53% men, and 72% White individuals. The model achieved a root mean squared error of 2.89 (95% CI, 2.62-3.19) and Spearman correlation of 0.92 (95% CI, 0.91-0.93) in the Massachusetts General Hospital test set, and 2.20 (95% CI, 1.69-2.66) and 0.96 (95% CI, 0.94-0.97) in the MIMIC validation set, respectively.

CONCLUSIONS

The automatic natural language processing-based model can enable large-scale stroke severity phenotyping from the electronic health record and support real-world quality improvement and comparative effectiveness studies in stroke.

摘要

背景

多中心电子健康记录可以支持中风的质量改进和比较效果研究。然而,基于电子健康记录的研究存在一些局限性,包括在提取关键临床变量(包括中风严重程度)方面的挑战,以及数据缺失的问题。我们开发了一种自然语言处理模型,可以直接从电子健康记录的注释中读取国立卫生研究院中风量表(NIHSS)的评分,并且在记录缺失时可以从临床文档中预测评分。

方法和结果

该研究纳入了 2015 年至 2022 年在马萨诸塞州总医院(Massachusetts General Hospital,MGH)就诊的急性中风(年龄≥18 岁)患者的记录。MGH 数据被分为训练/验证测试(70%/30%)集。我们开发了一个两阶段模型来预测来自 GWTG(Get With The Guidelines)中风登记处的入院 NIHSS 评分。我们使用最小绝对收缩和选择算子(least absolute shrinkage and selection operator,LASSO)来训练模型。对于有记录的 NIHSS 评分的测试记录,使用正则表达式(第一阶段)提取评分;对于没有记录的情况,使用 LASSO 进行预测(第二阶段)。该两阶段模型在验证测试集上进行了测试,并在医疗信息集市重症监护版 1.4(Medical Information Mart for Intensive Care,MIMIC-1.4)中进行了验证,使用均方根误差(root mean squared error,RMSE)和斯皮尔曼相关系数(Spearman correlation)。我们纳入了 4163 名患者(MGH 为 3876 名,MIMIC-1.4 为 287 名);平均年龄为 69(标准差为 15)岁;53%为男性,72%为白人。该模型在 MGH 测试集上的 RMSE 为 2.89(95%置信区间,2.62-3.19),Spearman 相关系数为 0.92(95%置信区间,0.91-0.93),在 MIMIC 验证集上的 RMSE 为 2.20(95%置信区间,1.69-2.66),Spearman 相关系数为 0.96(95%置信区间,0.94-0.97)。

结论

基于自动自然语言处理的模型可以从电子健康记录中实现大规模的中风严重程度表型分析,并支持中风的真实世界质量改进和比较效果研究。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a5c/11935650/0cad3601a969/JAH3-13-e036386-g002.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验