利用自然语言处理技术在住院电子病历数据中进行脑血管疾病病例识别。

Cerebrovascular disease case identification in inpatient electronic medical record data using natural language processing.

作者信息

Pan Jie, Zhang Zilong, Peters Steven Ray, Vatanpour Shabnam, Walker Robin L, Lee Seungwon, Martin Elliot A, Quan Hude

机构信息

Centre for Health Informatics, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.

Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.

出版信息

Brain Inform. 2023 Sep 2;10(1):22. doi: 10.1186/s40708-023-00203-w.

DOI:10.1186/s40708-023-00203-w

PMID:37658963

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10474977/

Abstract

BACKGROUND

Abstracting cerebrovascular disease (CeVD) from inpatient electronic medical records (EMRs) through natural language processing (NLP) is pivotal for automated disease surveillance and improving patient outcomes. Existing methods rely on coders' abstraction, which has time delays and under-coding issues. This study sought to develop an NLP-based method to detect CeVD using EMR clinical notes.

METHODS

CeVD status was confirmed through a chart review on randomly selected hospitalized patients who were 18 years or older and discharged from 3 hospitals in Calgary, Alberta, Canada, between January 1 and June 30, 2015. These patients' chart data were linked to administrative discharge abstract database (DAD) and Sunrise Clinical Manager (SCM) EMR database records by Personal Health Number (a unique lifetime identifier) and admission date. We trained multiple natural language processing (NLP) predictive models by combining two clinical concept extraction methods and two supervised machine learning (ML) methods: random forest and XGBoost. Using chart review as the reference standard, we compared the model performances with those of the commonly applied International Classification of Diseases (ICD-10-CA) codes, on the metrics of sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV).

RESULT

Of the study sample (n = 3036), the prevalence of CeVD was 11.8% (n = 360); the median patient age was 63; and females accounted for 50.3% (n = 1528) based on chart data. Among 49 extracted clinical documents from the EMR, four document types were identified as the most influential text sources for identifying CeVD disease ("nursing transfer report," "discharge summary," "nursing notes," and "inpatient consultation."). The best performing NLP model was XGBoost, combining the Unified Medical Language System concepts extracted by cTAKES (e.g., top-ranked concepts, "Cerebrovascular accident" and "Transient ischemic attack"), and the term frequency-inverse document frequency vectorizer. Compared with ICD codes, the model achieved higher validity overall, such as sensitivity (25.0% vs 70.0%), specificity (99.3% vs 99.1%), PPV (82.6 vs. 87.8%), and NPV (90.8% vs 97.1%).

CONCLUSION

The NLP algorithm developed in this study performed better than the ICD code algorithm in detecting CeVD. The NLP models could result in an automated EMR tool for identifying CeVD cases and be applied for future studies such as surveillance, and longitudinal studies.

摘要

背景

通过自然语言处理（NLP）从住院电子病历（EMR）中提取脑血管疾病（CeVD）对于自动疾病监测和改善患者预后至关重要。现有方法依赖于编码员的提取，存在时间延迟和编码不足的问题。本研究旨在开发一种基于NLP的方法，利用EMR临床记录检测CeVD。

方法

通过对2015年1月1日至6月30日期间在加拿大艾伯塔省卡尔加里市3家医院出院的18岁及以上随机选择的住院患者进行病历审查，确认CeVD状态。这些患者的病历数据通过个人健康号码（一个唯一的终身标识符）和入院日期与行政出院摘要数据库（DAD）和Sunrise Clinical Manager（SCM）EMR数据库记录相链接。我们通过结合两种临床概念提取方法和两种监督机器学习（ML）方法：随机森林和XGBoost，训练了多个自然语言处理（NLP）预测模型。以病历审查作为参考标准，我们在敏感性、特异性、阳性预测值（PPV）和阴性预测值（NPV）指标上，将模型性能与常用的国际疾病分类（ICD-10-CA）代码的性能进行了比较。

结果

在研究样本（n = 3036）中，CeVD的患病率为11.8%（n = 3,60）；患者年龄中位数为63岁；根据病历数据，女性占50.3%（n = 1528）。在从EMR中提取的49份临床文档中，四种文档类型被确定为识别CeVD疾病最具影响力的文本来源（ “护理转接报告”、“出院小结 ”、“护理记录” 和 “住院会诊”）。表现最佳的NLP模型是XGBoost模型，它结合了由cTAKES提取的统一医学语言系统概念（例如，排名靠前的概念，“脑血管意外” 和 “短暂性脑缺血发作”）以及词频-逆文档频率向量器。与ICD代码相比，该模型总体上具有更高的有效性，如敏感性（25.0% 对70.0%）、特异性（99.3% 对99.1%）、PPV（82.6对87.8）和NPV（90.8% 对97.1%）。

结论

本研究中开发的NLP算法在检测CeVD方面比ICD代码算法表现更好。NLP模型可以产生一个用于识别CeVD病例的自动化EMR工具，并应用于未来的研究，如监测和纵向研究。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eb0e/10474990/4e2d8720588c/40708_2023_203_Fig1_HTML.jpg

相似文献

Cerebrovascular disease case identification in inpatient electronic medical record data using natural language processing.利用自然语言处理技术在住院电子病历数据中进行脑血管疾病病例识别。

Brain Inform. 2023 Sep 2;10(1):22. doi: 10.1186/s40708-023-00203-w.

Developing an Inpatient Electronic Medical Record Phenotype for Hospital-Acquired Pressure Injuries: Case Study Using Natural Language Processing Models.开发用于医院获得性压力性损伤的住院电子病历表型：使用自然语言处理模型的案例研究

JMIR AI. 2023 Mar 8;2:e41264. doi: 10.2196/41264.

Enhancing ICD-Code-Based Case Definition for Heart Failure Using Electronic Medical Record Data.利用电子病历数据增强基于ICD编码的心力衰竭病例定义

J Card Fail. 2020 Jul;26(7):610-617. doi: 10.1016/j.cardfail.2020.04.003. Epub 2020 Apr 15.

A rule-based electronic phenotyping algorithm for detecting clinically relevant cardiovascular disease cases.一种用于检测临床相关心血管疾病病例的基于规则的电子表型分析算法。

BMC Res Notes. 2017 Jul 14;10(1):281. doi: 10.1186/s13104-017-2600-2.

Natural Language Processing Combined with ICD-9-CM Codes as a Novel Method to Study the Epidemiology of Allergic Drug Reactions.自然语言处理结合 ICD-9-CM 代码作为研究过敏性药物反应流行病学的新方法。

J Allergy Clin Immunol Pract. 2020 Mar;8(3):1032-1038.e1. doi: 10.1016/j.jaip.2019.12.007. Epub 2019 Dec 16.

Natural Language Processing of Clinical Notes to Identify Mental Illness and Substance Use Among People Living with HIV: Retrospective Cohort Study.利用临床记录的自然语言处理技术识别HIV感染者中的精神疾病和药物使用情况：回顾性队列研究

JMIR Med Inform. 2021 Mar 10;9(3):e23456. doi: 10.2196/23456.

Artificial intelligence approaches for phenotyping heart failure in U.S. Veterans Health Administration electronic health record.美国退伍军人事务部电子健康记录中基于人工智能的心力衰竭表型分析方法。

ESC Heart Fail. 2024 Oct;11(5):3155-3166. doi: 10.1002/ehf2.14787. Epub 2024 Jun 14.

Natural Language Processing and Machine Learning to Identify People Who Inject Drugs in Electronic Health Records.利用自然语言处理和机器学习在电子健康记录中识别注射毒品者。

Open Forum Infect Dis. 2022 Sep 12;9(9):ofac471. doi: 10.1093/ofid/ofac471. eCollection 2022 Sep.

Developing EMR-based algorithms to Identify hospital adverse events for health system performance evaluation and improvement: Study protocol.基于电子病历的算法开发以识别医院不良事件，用于卫生系统绩效评估和改进：研究方案。

PLoS One. 2022 Oct 5;17(10):e0275250. doi: 10.1371/journal.pone.0275250. eCollection 2022.

Hypertension identification using inpatient clinical notes from electronic medical records: an explainable, data-driven algorithm study.利用电子病历中的住院临床记录识别高血压：一种可解释的、数据驱动的算法研究。

CMAJ Open. 2023 Feb 14;11(1):E131-E139. doi: 10.9778/cmajo.20210170. Print 2023 Jan-Feb.

引用本文的文献

Clinical applications of large language models in medicine and surgery: A scoping review.大型语言模型在医学与外科中的临床应用：一项范围综述

J Int Med Res. 2025 Jul;53(7):3000605251347556. doi: 10.1177/03000605251347556. Epub 2025 Jul 4.

Using Natural Language Processing and Machine Learning to classify the status of kidney allograft in Electronic Medical Records written in Spanish.使用自然语言处理和机器学习对西班牙语电子病历中同种异体肾移植的状态进行分类。

PLoS One. 2025 May 8;20(5):e0322587. doi: 10.1371/journal.pone.0322587. eCollection 2025.

本文引用的文献

JMIR AI. 2023 Mar 8;2:e41264. doi: 10.2196/41264.

Natural Language Processing of Radiology Reports to Detect Complications of Ischemic Stroke.放射科报告的自然语言处理检测缺血性脑卒中并发症。

Neurocrit Care. 2022 Aug;37(Suppl 2):291-302. doi: 10.1007/s12028-022-01513-3. Epub 2022 May 9.

Precision medicine in stroke: towards personalized outcome predictions using artificial intelligence.脑卒中精准医学：利用人工智能进行个体化预后预测。

Brain. 2022 Apr 18;145(2):457-475. doi: 10.1093/brain/awab439.

Electronic Medical Record-Based Case Phenotyping for the Charlson Conditions: Scoping Review.基于电子病历的查尔森合并症病例表型分析：范围综述

JMIR Med Inform. 2021 Feb 1;9(2):e23934. doi: 10.2196/23934.

Automated Electronic Phenotyping of Cardioembolic Stroke.自动化电子心源性卒中表型分析。

Stroke. 2021 Jan;52(1):181-189. doi: 10.1161/STROKEAHA.120.030663. Epub 2020 Dec 10.

Unlocking the Potential of Electronic Health Records for Health Research.释放电子健康记录在健康研究中的潜力。

Int J Popul Data Sci. 2020 Jan 30;5(1):1123. doi: 10.23889/ijpds.v5i1.1123.

EMR-Based Phenotyping of Ischemic Stroke Using Supervised Machine Learning and Text Mining Techniques.基于电子病历的缺血性脑卒中表型分析：监督机器学习和文本挖掘技术的应用

IEEE J Biomed Health Inform. 2020 Oct;24(10):2922-2931. doi: 10.1109/JBHI.2020.2976931. Epub 2020 Feb 28.

Deep learning in clinical natural language processing: a methodical review.深度学习在临床自然语言处理中的应用：系统综述。

J Am Med Inform Assoc. 2020 Mar 1;27(3):457-470. doi: 10.1093/jamia/ocz200.

Automating Ischemic Stroke Subtype Classification Using Machine Learning and Natural Language Processing.使用机器学习和自然语言处理实现缺血性中风亚型分类的自动化

J Stroke Cerebrovasc Dis. 2019 Jul;28(7):2045-2051. doi: 10.1016/j.jstrokecerebrovasdis.2019.02.004. Epub 2019 May 15.

Deep Learning Natural Language Processing Successfully Predicts the Cerebrovascular Cause of Transient Ischemic Attack-Like Presentations.深度学习自然语言处理成功预测了短暂性脑缺血发作样表现的脑血管原因。

Stroke. 2019 Mar;50(3):758-760. doi: 10.1161/STROKEAHA.118.024124.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

利用自然语言处理技术在住院电子病历数据中进行脑血管疾病病例识别。

Cerebrovascular disease case identification in inpatient electronic medical record data using natural language processing.

作者信息

机构信息

出版信息

BACKGROUND

METHODS

RESULT

CONCLUSION

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献