结合事实医学知识与分布式词表示以改进临床命名实体识别。

Combine Factual Medical Knowledge and Distributed Word Representation to Improve Clinical Named Entity Recognition.

作者信息

Wu Yonghui, Yang Xi, Bian Jiang, Guo Yi, Xu Hua, Hogan William

机构信息

Departments of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, Florida, USA.

School of Biomedical Informatics, the University of Texas Health Science Center at Houston, Houston, Texas, USA.

出版信息

AMIA Annu Symp Proc. 2018 Dec 5;2018:1110-1117. eCollection 2018.

PMID:30815153

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6371322/

Abstract

There has been an increasing interest in developing deep learning methods to recognize clinical concepts from narrative clinical text. Recently, several studies have reported that Recurrent Neural Networks (RNNs) outperformed traditional machine learning methods such as Conditional Random Fields (CRFs). Deep learning-based Named Entity Recognition (NER) systems often use statistical language models to learn word embeddings from unlabeled corpora. However, current word embedding methods have limitations to learn decent representations for low-frequency words. Medicine is a knowledge-extensive domain; existing medical knowledge has the potential to improve feature representations for less frequent yet important words. However, it is still not clear how existing medical knowledge can help deep learning models in clinical NER tasks. In this study, we integrated medical knowledge from the Unified Medical Language System with word embeddings trained from an unlabeled clinical corpus in RNNs for detection of problems, treatments and lab tests. We examined three different ways to generate medical knowledge features, including a dictionary lookup program, the KnowledgeMap system, and the MedLEE system. We also compared representing medical knowledge as one-hot vectors versus representing medical knowledge as embedding layers. The evaluation results showed that the RNN with medical knowledge as embedding layers achieved new state-of-the-art performance (a strict F1 score of 86.21% and a relaxed F1 score of 92.80%) on the 2010 i2b2 corpus, outperforming an RNN with only word embeddings and RNNs with medical knowledge as one-hot vectors. This study demonstrated an efficient way of integrating medical knowledge with distributed word representations for clinical NER.

摘要

开发深度学习方法以从叙述性临床文本中识别临床概念的兴趣与日俱增。最近，多项研究报告称循环神经网络（RNN）的表现优于传统机器学习方法，如条件随机场（CRF）。基于深度学习的命名实体识别（NER）系统通常使用统计语言模型从未标记语料库中学习词嵌入。然而，当前的词嵌入方法在为低频词学习合适的表示方面存在局限性。医学是一个知识广泛的领域；现有的医学知识有潜力改善对低频但重要的词的特征表示。然而，目前尚不清楚现有的医学知识如何在临床NER任务中帮助深度学习模型。在本研究中，我们将统一医学语言系统中的医学知识与在RNN中从未标记临床语料库训练得到的词嵌入相结合，用于检测问题、治疗方法和实验室检查。我们研究了三种生成医学知识特征的不同方法，包括字典查找程序、知识图谱系统和MedLEE系统。我们还比较了将医学知识表示为独热向量与将医学知识表示为嵌入层的情况。评估结果表明，在2010年i2b2语料库上，将医学知识表示为嵌入层的RNN取得了新的最优性能（严格F1分数为86.21%，宽松F1分数为92.80%），优于仅使用词嵌入的RNN以及将医学知识表示为独热向量的RNN。这项研究展示了一种将医学知识与分布式词表示相结合用于临床NER的有效方法。

相似文献

Combine Factual Medical Knowledge and Distributed Word Representation to Improve Clinical Named Entity Recognition.结合事实医学知识与分布式词表示以改进临床命名实体识别。

AMIA Annu Symp Proc. 2018 Dec 5;2018:1110-1117. eCollection 2018.

Clinical Named Entity Recognition Using Deep Learning Models.使用深度学习模型的临床命名实体识别

AMIA Annu Symp Proc. 2018 Apr 16;2017:1812-1819. eCollection 2017.

A Study of Neural Word Embeddings for Named Entity Recognition in Clinical Text.用于临床文本中命名实体识别的神经词嵌入研究

AMIA Annu Symp Proc. 2015 Nov 5;2015:1326-33. eCollection 2015.

Combining Contextualized Embeddings and Prior Knowledge for Clinical Named Entity Recognition: Evaluation Study.结合上下文嵌入和先验知识进行临床命名实体识别：评估研究

JMIR Med Inform. 2019 Nov 13;7(4):e14850. doi: 10.2196/14850.

Clinical text classification with rule-based features and knowledge-guided convolutional neural networks.基于规则特征和知识引导卷积神经网络的临床文本分类。

BMC Med Inform Decis Mak. 2019 Apr 4;19(Suppl 3):71. doi: 10.1186/s12911-019-0781-4.

Entity recognition from clinical texts via recurrent neural network.基于循环神经网络的临床文本实体识别。

BMC Med Inform Decis Mak. 2017 Jul 5;17(Suppl 2):67. doi: 10.1186/s12911-017-0468-7.

Recognizing clinical entities in hospital discharge summaries using Structural Support Vector Machines with word representation features.使用带有词表示特征的结构支持向量机识别医院出院小结中的临床实体。

BMC Med Inform Decis Mak. 2013;13 Suppl 1(Suppl 1):S1. doi: 10.1186/1472-6947-13-S1-S1. Epub 2013 Apr 5.

A comparison of word embeddings for the biomedical natural language processing.生物医学自然语言处理中词嵌入的比较。

J Biomed Inform. 2018 Nov;87:12-20. doi: 10.1016/j.jbi.2018.09.008. Epub 2018 Sep 12.

DeIDNER Model: A Neural Network Named Entity Recognition Model for Use in the De-identification of Clinical Notes.DeIDNER模型：一种用于临床记录去识别化的神经网络命名实体识别模型。

Biomed Eng Syst Technol Int Jt Conf BIOSTEC Revis Sel Pap. 2022 Feb;5:640-647. doi: 10.5220/0010884500003123.

Use of word and graph embedding to measure semantic relatedness between Unified Medical Language System concepts.使用词和图嵌入来衡量统一医学语言系统概念之间的语义相关性。

J Am Med Inform Assoc. 2020 Oct 1;27(10):1538-1546. doi: 10.1093/jamia/ocaa136.

引用本文的文献

Natural Language Processing in medicine and ophthalmology: A review for the 21st-century clinician.医学和眼科学中的自然语言处理：21 世纪临床医生的综述。

Asia Pac J Ophthalmol (Phila). 2024 Jul-Aug;13(4):100084. doi: 10.1016/j.apjo.2024.100084. Epub 2024 Jul 25.

Medical-informed machine learning: integrating prior knowledge into medical decision systems.医学信息机器学习：将先验知识集成到医学决策系统中。

BMC Med Inform Decis Mak. 2024 Jun 28;24(Suppl 4):186. doi: 10.1186/s12911-024-02582-4.

Leveraging Semantic Type Dependencies for Clinical Named Entity Recognition.利用语义类型依赖关系进行临床命名实体识别。

AMIA Annu Symp Proc. 2023 Apr 29;2022:662-671. eCollection 2022.

A large language model for electronic health records.用于电子健康记录的大型语言模型。

NPJ Digit Med. 2022 Dec 26;5(1):194. doi: 10.1038/s41746-022-00742-2.

The h-ANN Model: Comprehensive Colonoscopy Concept Compilation Using Combined Contextual Embeddings.h-ANN模型：使用组合上下文嵌入的结肠镜检查综合概念汇编。

Biomed Eng Syst Technol Int Jt Conf BIOSTEC Revis Sel Pap. 2022 Feb;5:189-200. doi: 10.5220/0010903300003123.

TAX-Corpus: Taxonomy based Annotations for Colonoscopy Evaluation.TAX-Corpus：用于结肠镜检查评估的基于分类法的注释

Biomed Eng Syst Technol Int Jt Conf BIOSTEC Revis Sel Pap. 2022 Feb;2022:162-169. doi: 10.5220/0010876100003123.

A Novel COVID-19 Data Set and an Effective Deep Learning Approach for the De-Identification of Italian Medical Records.一个用于意大利医疗记录去识别化的新型新冠病毒数据集及有效的深度学习方法。

IEEE Access. 2021 Jan 25;9:19097-19110. doi: 10.1109/ACCESS.2021.3054479. eCollection 2021.

Applications of artificial intelligence in drug development using real-world data.人工智能在真实世界数据药物研发中的应用。

Drug Discov Today. 2021 May;26(5):1256-1264. doi: 10.1016/j.drudis.2020.12.013. Epub 2020 Dec 24.

Extracting Family History of Patients From Clinical Narratives: Exploring an End-to-End Solution With Deep Learning Models.从临床叙述中提取患者家族病史：使用深度学习模型探索端到端解决方案

JMIR Med Inform. 2020 Dec 15;8(12):e22982. doi: 10.2196/22982.

Clinical Named Entity Recognition from Chinese Electronic Medical Records Based on Deep Learning Pretraining.基于深度学习预训练的中文电子病历临床命名实体识别。

J Healthc Eng. 2020 Nov 24;2020:8829219. doi: 10.1155/2020/8829219. eCollection 2020.

本文引用的文献

Clinical Named Entity Recognition Using Deep Learning Models.使用深度学习模型的临床命名实体识别

AMIA Annu Symp Proc. 2018 Apr 16;2017:1812-1819. eCollection 2017.

Recurrent neural networks with specialized word embeddings for health-domain named-entity recognition.用于健康领域命名实体识别的具有专用词嵌入的递归神经网络。

J Biomed Inform. 2017 Dec;76:102-109. doi: 10.1016/j.jbi.2017.11.007. Epub 2017 Nov 13.

Entity recognition from clinical texts via recurrent neural network.基于循环神经网络的临床文本实体识别。

BMC Med Inform Decis Mak. 2017 Jul 5;17(Suppl 2):67. doi: 10.1186/s12911-017-0468-7.

A Study of Neural Word Embeddings for Named Entity Recognition in Clinical Text.用于临床文本中命名实体识别的神经词嵌入研究

AMIA Annu Symp Proc. 2015 Nov 5;2015:1326-33. eCollection 2015.

Named Entity Recognition in Chinese Clinical Text Using Deep Neural Network.基于深度神经网络的中文临床文本命名实体识别

Stud Health Technol Inform. 2015;216:624-8.

Evaluating the state of the art in disorder recognition and normalization of the clinical narrative.评估临床病历中疾病识别和规范化的当前技术水平。

J Am Med Inform Assoc. 2015 Jan;22(1):143-54. doi: 10.1136/amiajnl-2013-002544. Epub 2014 Aug 21.

A hybrid system for temporal information extraction from clinical text.一种从临床文本中提取时间信息的混合系统。

J Am Med Inform Assoc. 2013 Sep-Oct;20(5):828-35. doi: 10.1136/amiajnl-2013-001635. Epub 2013 Apr 9.

BMC Med Inform Decis Mak. 2013;13 Suppl 1(Suppl 1):S1. doi: 10.1186/1472-6947-13-S1-S1. Epub 2013 Apr 5.

Evaluating temporal relations in clinical text: 2012 i2b2 Challenge.评估临床文本中的时间关系：2012 i2b2 挑战赛。

J Am Med Inform Assoc. 2013 Sep-Oct;20(5):806-13. doi: 10.1136/amiajnl-2013-001628. Epub 2013 Apr 5.

Enhancing clinical concept extraction with distributional semantics.利用分布语义增强临床概念提取。

J Biomed Inform. 2012 Feb;45(1):129-40. doi: 10.1016/j.jbi.2011.10.007. Epub 2011 Nov 7.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验