从真实世界研究的电子健康记录中识别与中风相关的定量证据。

Identifying stroke-related quantified evidence from electronic health records in real-world studies.

机构信息

Institute of Medical Information and Library, Chinese Academy of Medical Sciences/Peking Union Medical College, Beijing 100020, China; Key Laboratory of Medical Information Intelligent Technology, Chinese Academy of Medical Sciences, Beijing 100020, China.

Institute of Medical Information and Library, Chinese Academy of Medical Sciences/Peking Union Medical College, Beijing 100020, China; School of Health Care Technology, Dalian Neusoft University of Information, Dalian 116023, China.

出版信息

Artif Intell Med. 2023 Jun;140:102552. doi: 10.1016/j.artmed.2023.102552. Epub 2023 Apr 23.

DOI:10.1016/j.artmed.2023.102552

PMID:37210153

Abstract

BACKGROUND

Stroke is one of the leading causes of death and disability worldwide. The National Institutes of Health Stroke Scale (NIHSS) scores in electronic health records (EHRs), which quantitatively describe patients' neurological deficits in evidence-based treatment, are crucial in stroke-related clinical investigations. However, the free-text format and lack of standardization inhibit their effective use. Automatically extracting the scale scores from the clinical free text so that its potential value in real-world studies is realized has become an important goal.

OBJECTIVE

This study aims to develop an automated method to extract scale scores from the free text of EHRs.

METHODS

We propose a two-step pipeline method to identify NIHSS items and numerical scores and validate its feasibility using a freely accessible critical care database: MIMIC-III (Medical Information Mart for Intensive Care III). First, we utilize MIMIC-III to create an annotated corpus. Then, we investigate possible machine learning methods for two subtasks, NIHSS item and score recognition and item-score relation extraction. In the evaluation, we conduct both task-specific and end-to-end evaluations and compare our method with the rule-based method using precision, recall and F1 scores as evaluation metrics.

RESULTS

We use all available discharge summaries of stroke cases in MIMIC-III. The annotated NIHSS corpus contains 312 cases, 2929 scale items, 2774 scores and 2733 relations. The results show that the best F1-score of our method was 0.9006, which was attained by combining BERT-BiLSTM-CRF and Random Forest, and it outperformed the rule-based method (F1-score = 0.8098). In the end-to-end task, our method could successfully recognize the item "1b level of consciousness questions", the score "1" and their relation "('1b level of consciousness questions', '1', 'has value')" from the sentence "1b level of consciousness questions: said name = 1", while the rule-based method could not.

CONCLUSIONS

The two-step pipeline method we propose is an effective approach to identify NIHSS items, scores and their relations. With its help, clinical investigators can easily retrieve and access structured scale data, thereby supporting stroke-related real-world studies.

摘要

背景

中风是全球范围内导致死亡和残疾的主要原因之一。国立卫生研究院中风量表（NIHSS）评分记录在电子健康记录（EHR）中，它定量描述了患者在循证治疗中的神经功能缺陷，在中风相关的临床研究中至关重要。然而，由于其自由文本格式和缺乏标准化，限制了其有效使用。自动从临床自由文本中提取量表评分，使其在真实世界研究中的潜在价值得以实现，已成为一个重要目标。

目的

本研究旨在开发一种从 EHR 临床自由文本中自动提取量表评分的方法。

方法

我们提出了一种两步流水线方法来识别 NIHSS 项目和数值评分，并使用可免费访问的重症监护数据库 MIMIC-III（医疗信息集市用于重症监护 III）来验证其可行性。首先，我们利用 MIMIC-III 创建一个带注释的语料库。然后，我们研究了 NIHSS 项目和分数识别以及项目-分数关系提取这两个子任务的可能机器学习方法。在评估中，我们进行了特定任务和端到端的评估，并使用精度、召回率和 F1 分数作为评估指标，将我们的方法与基于规则的方法进行了比较。

结果

我们使用 MIMIC-III 中所有可用的中风病例的出院总结。注释的 NIHSS 语料库包含 312 个病例、2929 个量表项目、2774 个评分和 2733 个关系。结果表明，我们的方法的最佳 F1 得分为 0.9006，是通过结合 BERT-BiLSTM-CRF 和随机森林实现的，优于基于规则的方法（F1 得分为 0.8098）。在端到端任务中，我们的方法可以成功地从句子“1b 意识水平问题：说名字=1”中识别出项目“1b 意识水平问题”、分数“1”及其关系“（‘1b 意识水平问题’，‘1’，‘具有值’）”，而基于规则的方法则无法识别。

结论

我们提出的两步流水线方法是一种识别 NIHSS 项目、评分及其关系的有效方法。有了它的帮助，临床研究人员可以方便地检索和访问结构化的量表数据，从而支持中风相关的真实世界研究。

相似文献

Identifying stroke-related quantified evidence from electronic health records in real-world studies.

Artif Intell Med. 2023 Jun;140:102552. doi: 10.1016/j.artmed.2023.102552. Epub 2023 Apr 23.

Automatic quantitative stroke severity assessment based on Chinese clinical named entity recognition with domain-adaptive pre-trained large language model.

Artif Intell Med. 2024 Apr;150:102822. doi: 10.1016/j.artmed.2024.102822. Epub 2024 Feb 27.

Extracting clinical named entity for pituitary adenomas from Chinese electronic medical records.

BMC Med Inform Decis Mak. 2022 Mar 23;22(1):72. doi: 10.1186/s12911-022-01810-z.

Assessing stroke severity using electronic health record data: a machine learning approach.

BMC Med Inform Decis Mak. 2020 Jan 8;20(1):8. doi: 10.1186/s12911-019-1010-x.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

Extraction of Information Related to Drug Safety Surveillance From Electronic Health Record Notes: Joint Modeling of Entities and Relations Using Knowledge-Aware Neural Attentive Models.

JMIR Med Inform. 2020 Jul 10;8(7):e18417. doi: 10.2196/18417.

A Hybrid Model for Family History Information Identification and Relation Extraction: Development and Evaluation of an End-to-End Information Extraction System.

JMIR Med Inform. 2021 Apr 22;9(4):e22797. doi: 10.2196/22797.

Classifying social determinants of health from unstructured electronic health records using deep learning-based natural language processing.

J Biomed Inform. 2022 Mar;127:103984. doi: 10.1016/j.jbi.2021.103984. Epub 2022 Jan 7.

Clinical Named Entity Recognition From Chinese Electronic Health Records via Machine Learning Methods.

JMIR Med Inform. 2018 Dec 17;6(4):e50. doi: 10.2196/medinform.9965.

Automated Extraction of Stroke Severity from Unstructured Electronic Health Records using Natural Language Processing.

medRxiv. 2024 Mar 11:2024.03.08.24304011. doi: 10.1101/2024.03.08.24304011.

引用本文的文献

Trends and methods in intensive care unit (ICU) research using machine learning: latent dirichlet allocation (LDA)-based thematic literature review.

BMC Med Inform Decis Mak. 2025 Jul 29;25(1):282. doi: 10.1186/s12911-025-03132-2.

Automated Extraction of Stroke Severity From Unstructured Electronic Health Records Using Natural Language Processing.

J Am Heart Assoc. 2024 Nov 5;13(21):e036386. doi: 10.1161/JAHA.124.036386. Epub 2024 Oct 25.

Digital health in stroke: a narrative review.

Arq Neuropsiquiatr. 2024 Aug;82(8):1-10. doi: 10.1055/s-0044-1789201. Epub 2024 Aug 26.

Automated identification of fall-related injuries in unstructured clinical notes.

Am J Epidemiol. 2025 Apr 8;194(4):1097-1105. doi: 10.1093/aje/kwae240.

Automated Extraction of Stroke Severity from Unstructured Electronic Health Records using Natural Language Processing.

medRxiv. 2024 Mar 11:2024.03.08.24304011. doi: 10.1101/2024.03.08.24304011.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

从真实世界研究的电子健康记录中识别与中风相关的定量证据。

Identifying stroke-related quantified evidence from electronic health records in real-world studies.

机构信息

出版信息

BACKGROUND

OBJECTIVE

METHODS

RESULTS

CONCLUSIONS

背景

目的

方法

结果

结论

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献