• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用自然语言处理从非结构化电子健康记录中自动提取中风严重程度

Automated Extraction of Stroke Severity From Unstructured Electronic Health Records Using Natural Language Processing.

机构信息

Department of Neurology Massachusetts General Hospital (MGH) Boston MA.

Department of Neurology Beth Israel Deaconess Medical Center (BIDMC) Boston MA.

出版信息

J Am Heart Assoc. 2024 Nov 5;13(21):e036386. doi: 10.1161/JAHA.124.036386. Epub 2024 Oct 25.

DOI:10.1161/JAHA.124.036386
PMID:39450737
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11935650/
Abstract

BACKGROUND

Multicenter electronic health records can support quality improvement and comparative effectiveness research in stroke. However, limitations of electronic health record-based research include challenges in abstracting key clinical variables, including stroke severity, along with missing data. We developed a natural language processing model that reads electronic health record notes to directly extract the National Institutes of Health Stroke Scale score when documented and predict the score from clinical documentation when missing.

METHODS AND RESULTS

The study included notes from patients with acute stroke (aged ≥18 years) admitted to Massachusetts General Hospital (2015-2022). The Massachusetts General Hospital data were divided into training/holdout test (70%/30%) sets. We developed a 2-stage model to predict the admission National Institutes of Health Stroke Scale, obtained from the GWTG (Get With The Guidelines) stroke registry. We trained a model with the least absolute shrinkage and selection operator. For test notes with documented National Institutes of Health Stroke Scale, scores were extracted using regular expressions (stage 1); when not documented, least absolute shrinkage and selection operator was used for prediction (stage 2). The 2-stage model was tested on the holdout test set and validated in the Medical Information Mart for Intensive Care (2001-2012) version 1.4, using root mean squared error and Spearman correlation. We included 4163 patients (Massachusetts General Hospital, 3876; Medical Information Mart for Intensive Care, 287); average age, 69 (SD, 15) years; 53% men, and 72% White individuals. The model achieved a root mean squared error of 2.89 (95% CI, 2.62-3.19) and Spearman correlation of 0.92 (95% CI, 0.91-0.93) in the Massachusetts General Hospital test set, and 2.20 (95% CI, 1.69-2.66) and 0.96 (95% CI, 0.94-0.97) in the MIMIC validation set, respectively.

CONCLUSIONS

The automatic natural language processing-based model can enable large-scale stroke severity phenotyping from the electronic health record and support real-world quality improvement and comparative effectiveness studies in stroke.

摘要

背景

多中心电子健康记录可以支持中风的质量改进和比较效果研究。然而,基于电子健康记录的研究存在一些局限性,包括在提取关键临床变量(包括中风严重程度)方面的挑战,以及数据缺失的问题。我们开发了一种自然语言处理模型,可以直接从电子健康记录的注释中读取国立卫生研究院中风量表(NIHSS)的评分,并且在记录缺失时可以从临床文档中预测评分。

方法和结果

该研究纳入了 2015 年至 2022 年在马萨诸塞州总医院(Massachusetts General Hospital,MGH)就诊的急性中风(年龄≥18 岁)患者的记录。MGH 数据被分为训练/验证测试(70%/30%)集。我们开发了一个两阶段模型来预测来自 GWTG(Get With The Guidelines)中风登记处的入院 NIHSS 评分。我们使用最小绝对收缩和选择算子(least absolute shrinkage and selection operator,LASSO)来训练模型。对于有记录的 NIHSS 评分的测试记录,使用正则表达式(第一阶段)提取评分;对于没有记录的情况,使用 LASSO 进行预测(第二阶段)。该两阶段模型在验证测试集上进行了测试,并在医疗信息集市重症监护版 1.4(Medical Information Mart for Intensive Care,MIMIC-1.4)中进行了验证,使用均方根误差(root mean squared error,RMSE)和斯皮尔曼相关系数(Spearman correlation)。我们纳入了 4163 名患者(MGH 为 3876 名,MIMIC-1.4 为 287 名);平均年龄为 69(标准差为 15)岁;53%为男性,72%为白人。该模型在 MGH 测试集上的 RMSE 为 2.89(95%置信区间,2.62-3.19),Spearman 相关系数为 0.92(95%置信区间,0.91-0.93),在 MIMIC 验证集上的 RMSE 为 2.20(95%置信区间,1.69-2.66),Spearman 相关系数为 0.96(95%置信区间,0.94-0.97)。

结论

基于自动自然语言处理的模型可以从电子健康记录中实现大规模的中风严重程度表型分析,并支持中风的真实世界质量改进和比较效果研究。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a5c/11935650/c992c3a42728/JAH3-13-e036386-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a5c/11935650/0cad3601a969/JAH3-13-e036386-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a5c/11935650/967a226bd308/JAH3-13-e036386-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a5c/11935650/af1e93e66b6f/JAH3-13-e036386-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a5c/11935650/9f3a6bc25777/JAH3-13-e036386-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a5c/11935650/c992c3a42728/JAH3-13-e036386-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a5c/11935650/0cad3601a969/JAH3-13-e036386-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a5c/11935650/967a226bd308/JAH3-13-e036386-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a5c/11935650/af1e93e66b6f/JAH3-13-e036386-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a5c/11935650/9f3a6bc25777/JAH3-13-e036386-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a5c/11935650/c992c3a42728/JAH3-13-e036386-g005.jpg

相似文献

1
Automated Extraction of Stroke Severity From Unstructured Electronic Health Records Using Natural Language Processing.使用自然语言处理从非结构化电子健康记录中自动提取中风严重程度
J Am Heart Assoc. 2024 Nov 5;13(21):e036386. doi: 10.1161/JAHA.124.036386. Epub 2024 Oct 25.
2
Automated Extraction of Stroke Severity from Unstructured Electronic Health Records using Natural Language Processing.使用自然语言处理技术从非结构化电子健康记录中自动提取中风严重程度
medRxiv. 2024 Mar 11:2024.03.08.24304011. doi: 10.1101/2024.03.08.24304011.
3
Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.两种现代生存预测工具 SORG-MLA 和 METSSS 在接受手术联合放疗和单纯放疗治疗有症状长骨转移患者中的比较。
Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.
4
A New Measure of Quantified Social Health Is Associated With Levels of Discomfort, Capability, and Mental and General Health Among Patients Seeking Musculoskeletal Specialty Care.一种新的量化社会健康指标与寻求肌肉骨骼专科护理的患者的不适程度、能力以及心理和总体健康水平相关。
Clin Orthop Relat Res. 2025 Apr 1;483(4):647-663. doi: 10.1097/CORR.0000000000003394. Epub 2025 Feb 5.
5
Natural Language Processing of Clinical Documentation to Assess Functional Status in Patients With Heart Failure.临床文档的自然语言处理用于评估心力衰竭患者的功能状态。
JAMA Netw Open. 2024 Nov 4;7(11):e2443925. doi: 10.1001/jamanetworkopen.2024.43925.
6
Extraction of sleep information from clinical notes of Alzheimer's disease patients using natural language processing.使用自然语言处理从阿尔茨海默病患者的临床记录中提取睡眠信息。
J Am Med Inform Assoc. 2024 Oct 1;31(10):2217-2227. doi: 10.1093/jamia/ocae177.
7
Utilizing large language models for detecting hospital-acquired conditions: an empirical study on pulmonary embolism.利用大语言模型检测医院获得性疾病:关于肺栓塞的实证研究
J Am Med Inform Assoc. 2025 May 1;32(5):876-884. doi: 10.1093/jamia/ocaf048.
8
Cost-effectiveness of using prognostic information to select women with breast cancer for adjuvant systemic therapy.利用预后信息为乳腺癌患者选择辅助性全身治疗的成本效益
Health Technol Assess. 2006 Sep;10(34):iii-iv, ix-xi, 1-204. doi: 10.3310/hta10340.
9
Automated monitoring compared to standard care for the early detection of sepsis in critically ill patients.与标准护理相比,自动监测用于危重症患者脓毒症的早期检测
Cochrane Database Syst Rev. 2018 Jun 25;6(6):CD012404. doi: 10.1002/14651858.CD012404.pub2.
10
Chlorhexidine mouthrinse as an adjunctive treatment for gingival health.洗必泰漱口水作为牙龈健康的辅助治疗方法。
Cochrane Database Syst Rev. 2017 Mar 31;3(3):CD008676. doi: 10.1002/14651858.CD008676.pub2.

引用本文的文献

1
Zero-Shot Extraction of Seizure Outcomes from Clinical Notes Using Generative Pretrained Transformers.使用生成式预训练变换器从临床记录中进行癫痫发作结果的零样本提取。
J Healthc Inform Res. 2025 Apr 29;9(3):380-400. doi: 10.1007/s41666-025-00198-5. eCollection 2025 Sep.

本文引用的文献

1
Optimum Baseline Clinical Severity Scale Cut Points for Prognosticating Intracerebral Hemorrhage: INTERACT Studies.最佳基线临床严重程度评分截断点预测脑出血预后:INTERACT 研究。
Stroke. 2024 Jan;55(1):139-145. doi: 10.1161/STROKEAHA.123.044538. Epub 2023 Nov 29.
2
From Admission to Discharge: Predicting National Institutes of Health Stroke Scale Progression in Stroke Patients Using Biomarkers and Explainable Machine Learning.从入院到出院:使用生物标志物和可解释机器学习预测中风患者的美国国立卫生研究院卒中量表进展情况
J Pers Med. 2023 Sep 14;13(9):1375. doi: 10.3390/jpm13091375.
3
Identifying stroke-related quantified evidence from electronic health records in real-world studies.
从真实世界研究的电子健康记录中识别与中风相关的定量证据。
Artif Intell Med. 2023 Jun;140:102552. doi: 10.1016/j.artmed.2023.102552. Epub 2023 Apr 23.
4
Insights into measuring health disparities using electronic health records from a statewide network of health systems: A case study.利用全州卫生系统网络中的电子健康记录衡量健康差异的见解:一项案例研究。
J Clin Transl Sci. 2023 Feb 1;7(1):e54. doi: 10.1017/cts.2022.521. eCollection 2023.
5
Predicting the Severity of Neurological Impairment Caused by Ischemic Stroke Using Deep Learning Based on Diffusion-Weighted Images.基于扩散加权图像,利用深度学习预测缺血性中风所致神经功能缺损的严重程度。
J Clin Med. 2022 Jul 11;11(14):4008. doi: 10.3390/jcm11144008.
6
Value of the Electronic Medical Record for Hospital Care: Update From the Literature.电子病历在医院护理中的价值:文献综述更新
J Med Internet Res. 2021 Dec 23;23(12):e26323. doi: 10.2196/26323.
7
Regularized Ordinal Regression and the ordinalNet R Package.正则化有序回归与ordinalNet R包。
J Stat Softw. 2021 Sep;99(6). doi: 10.18637/jss.v099.i06.
8
Improving Prehospital Stroke Diagnosis Using Natural Language Processing of Paramedic Reports.利用急救员报告的自然语言处理提高院前卒中诊断
Stroke. 2021 Aug;52(8):2676-2679. doi: 10.1161/STROKEAHA.120.033580. Epub 2021 Jun 24.
9
Automated Electronic Phenotyping of Cardioembolic Stroke.自动化电子心源性卒中表型分析。
Stroke. 2021 Jan;52(1):181-189. doi: 10.1161/STROKEAHA.120.030663. Epub 2020 Dec 10.
10
Assessing stroke severity using electronic health record data: a machine learning approach.利用电子健康记录数据评估中风严重程度:一种机器学习方法。
BMC Med Inform Decis Mak. 2020 Jan 8;20(1):8. doi: 10.1186/s12911-019-1010-x.