Suppr超能文献

结合事实医学知识与分布式词表示以改进临床命名实体识别。

Combine Factual Medical Knowledge and Distributed Word Representation to Improve Clinical Named Entity Recognition.

作者信息

Wu Yonghui, Yang Xi, Bian Jiang, Guo Yi, Xu Hua, Hogan William

机构信息

Departments of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, Florida, USA.

School of Biomedical Informatics, the University of Texas Health Science Center at Houston, Houston, Texas, USA.

出版信息

AMIA Annu Symp Proc. 2018 Dec 5;2018:1110-1117. eCollection 2018.

Abstract

There has been an increasing interest in developing deep learning methods to recognize clinical concepts from narrative clinical text. Recently, several studies have reported that Recurrent Neural Networks (RNNs) outperformed traditional machine learning methods such as Conditional Random Fields (CRFs). Deep learning-based Named Entity Recognition (NER) systems often use statistical language models to learn word embeddings from unlabeled corpora. However, current word embedding methods have limitations to learn decent representations for low-frequency words. Medicine is a knowledge-extensive domain; existing medical knowledge has the potential to improve feature representations for less frequent yet important words. However, it is still not clear how existing medical knowledge can help deep learning models in clinical NER tasks. In this study, we integrated medical knowledge from the Unified Medical Language System with word embeddings trained from an unlabeled clinical corpus in RNNs for detection of problems, treatments and lab tests. We examined three different ways to generate medical knowledge features, including a dictionary lookup program, the KnowledgeMap system, and the MedLEE system. We also compared representing medical knowledge as one-hot vectors versus representing medical knowledge as embedding layers. The evaluation results showed that the RNN with medical knowledge as embedding layers achieved new state-of-the-art performance (a strict F1 score of 86.21% and a relaxed F1 score of 92.80%) on the 2010 i2b2 corpus, outperforming an RNN with only word embeddings and RNNs with medical knowledge as one-hot vectors. This study demonstrated an efficient way of integrating medical knowledge with distributed word representations for clinical NER.

摘要

开发深度学习方法以从叙述性临床文本中识别临床概念的兴趣与日俱增。最近,多项研究报告称循环神经网络(RNN)的表现优于传统机器学习方法,如条件随机场(CRF)。基于深度学习的命名实体识别(NER)系统通常使用统计语言模型从未标记语料库中学习词嵌入。然而,当前的词嵌入方法在为低频词学习合适的表示方面存在局限性。医学是一个知识广泛的领域;现有的医学知识有潜力改善对低频但重要的词的特征表示。然而,目前尚不清楚现有的医学知识如何在临床NER任务中帮助深度学习模型。在本研究中,我们将统一医学语言系统中的医学知识与在RNN中从未标记临床语料库训练得到的词嵌入相结合,用于检测问题、治疗方法和实验室检查。我们研究了三种生成医学知识特征的不同方法,包括字典查找程序、知识图谱系统和MedLEE系统。我们还比较了将医学知识表示为独热向量与将医学知识表示为嵌入层的情况。评估结果表明,在2010年i2b2语料库上,将医学知识表示为嵌入层的RNN取得了新的最优性能(严格F1分数为86.21%,宽松F1分数为92.80%),优于仅使用词嵌入的RNN以及将医学知识表示为独热向量的RNN。这项研究展示了一种将医学知识与分布式词表示相结合用于临床NER的有效方法。

相似文献

6
Entity recognition from clinical texts via recurrent neural network.基于循环神经网络的临床文本实体识别。
BMC Med Inform Decis Mak. 2017 Jul 5;17(Suppl 2):67. doi: 10.1186/s12911-017-0468-7.

引用本文的文献

4
A large language model for electronic health records.用于电子健康记录的大型语言模型。
NPJ Digit Med. 2022 Dec 26;5(1):194. doi: 10.1038/s41746-022-00742-2.
6
TAX-Corpus: Taxonomy based Annotations for Colonoscopy Evaluation.TAX-Corpus:用于结肠镜检查评估的基于分类法的注释
Biomed Eng Syst Technol Int Jt Conf BIOSTEC Revis Sel Pap. 2022 Feb;2022:162-169. doi: 10.5220/0010876100003123.
8
Applications of artificial intelligence in drug development using real-world data.人工智能在真实世界数据药物研发中的应用。
Drug Discov Today. 2021 May;26(5):1256-1264. doi: 10.1016/j.drudis.2020.12.013. Epub 2020 Dec 24.

本文引用的文献

3
Entity recognition from clinical texts via recurrent neural network.基于循环神经网络的临床文本实体识别。
BMC Med Inform Decis Mak. 2017 Jul 5;17(Suppl 2):67. doi: 10.1186/s12911-017-0468-7.
7
A hybrid system for temporal information extraction from clinical text.一种从临床文本中提取时间信息的混合系统。
J Am Med Inform Assoc. 2013 Sep-Oct;20(5):828-35. doi: 10.1136/amiajnl-2013-001635. Epub 2013 Apr 9.
9
Evaluating temporal relations in clinical text: 2012 i2b2 Challenge.评估临床文本中的时间关系:2012 i2b2 挑战赛。
J Am Med Inform Assoc. 2013 Sep-Oct;20(5):806-13. doi: 10.1136/amiajnl-2013-001628. Epub 2013 Apr 5.
10
Enhancing clinical concept extraction with distributional semantics.利用分布语义增强临床概念提取。
J Biomed Inform. 2012 Feb;45(1):129-40. doi: 10.1016/j.jbi.2011.10.007. Epub 2011 Nov 7.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验