用于健康领域命名实体识别的具有专用词嵌入的递归神经网络。

Recurrent neural networks with specialized word embeddings for health-domain named-entity recognition.

机构信息

University of Technology Sydney (UTS), Australia; Capital Markets Cooperative Research Centre (CMCRC), Australia.

Capital Markets Cooperative Research Centre (CMCRC), Australia.

出版信息

J Biomed Inform. 2017 Dec;76:102-109. doi: 10.1016/j.jbi.2017.11.007. Epub 2017 Nov 13.

DOI:10.1016/j.jbi.2017.11.007

PMID:29146561

Abstract

BACKGROUND

Previous state-of-the-art systems on Drug Name Recognition (DNR) and Clinical Concept Extraction (CCE) have focused on a combination of text "feature engineering" and conventional machine learning algorithms such as conditional random fields and support vector machines. However, developing good features is inherently heavily time-consuming. Conversely, more modern machine learning approaches such as recurrent neural networks (RNNs) have proved capable of automatically learning effective features from either random assignments or automated word "embeddings".

OBJECTIVES

(i) To create a highly accurate DNR and CCE system that avoids conventional, time-consuming feature engineering. (ii) To create richer, more specialized word embeddings by using health domain datasets such as MIMIC-III. (iii) To evaluate our systems over three contemporary datasets.

METHODS

Two deep learning methods, namely the Bidirectional LSTM and the Bidirectional LSTM-CRF, are evaluated. A CRF model is set as the baseline to compare the deep learning systems to a traditional machine learning approach. The same features are used for all the models.

RESULTS

We have obtained the best results with the Bidirectional LSTM-CRF model, which has outperformed all previously proposed systems. The specialized embeddings have helped to cover unusual words in DrugBank and MedLine, but not in the i2b2/VA dataset.

CONCLUSIONS

We present a state-of-the-art system for DNR and CCE. Automated word embeddings has allowed us to avoid costly feature engineering and achieve higher accuracy. Nevertheless, the embeddings need to be retrained over datasets that are adequate for the domain, in order to adequately cover the domain-specific vocabulary.

摘要

背景

以前的药物名称识别（DNR）和临床概念提取（CCE）的最新系统都集中在文本“特征工程”和传统机器学习算法（如条件随机场和支持向量机）的结合上。然而，开发良好的特征本质上是非常耗时的。相反，更现代的机器学习方法，如递归神经网络（RNN），已经证明能够从随机分配或自动单词“嵌入”中自动学习有效的特征。

目的

（i）创建一个高度准确的 DNR 和 CCE 系统，避免传统的、耗时的特征工程。（ii）通过使用 MIMIC-III 等健康领域数据集创建更丰富、更专业的词嵌入。（iii）在三个现代数据集上评估我们的系统。

方法

评估了两种深度学习方法，即双向 LSTM 和双向 LSTM-CRF。设置一个 CRF 模型作为基线，将深度学习系统与传统的机器学习方法进行比较。所有模型都使用相同的特征。

结果

我们使用双向 LSTM-CRF 模型获得了最佳结果，该模型的性能优于所有以前提出的系统。专业的嵌入帮助涵盖了 DrugBank 和 MedLine 中的不常见单词，但在 i2b2/VA 数据集上则没有。

结论

我们提出了一种药物名称识别和临床概念提取的最新系统。自动单词嵌入使我们能够避免昂贵的特征工程，并实现更高的准确性。然而，为了充分涵盖特定于领域的词汇，嵌入需要在适合该领域的数据集上进行重新训练。

相似文献

Recurrent neural networks with specialized word embeddings for health-domain named-entity recognition.

J Biomed Inform. 2017 Dec;76:102-109. doi: 10.1016/j.jbi.2017.11.007. Epub 2017 Nov 13.

Long short-term memory RNN for biomedical named entity recognition.

BMC Bioinformatics. 2017 Oct 30;18(1):462. doi: 10.1186/s12859-017-1868-5.

Entity recognition from clinical texts via recurrent neural network.

BMC Med Inform Decis Mak. 2017 Jul 5;17(Suppl 2):67. doi: 10.1186/s12911-017-0468-7.

Character-level neural network for biomedical named entity recognition.

J Biomed Inform. 2017 Jun;70:85-91. doi: 10.1016/j.jbi.2017.05.002. Epub 2017 May 11.

Word embeddings and recurrent neural networks based on Long-Short Term Memory nodes in supervised biomedical word sense disambiguation.

J Biomed Inform. 2017 Sep;73:137-147. doi: 10.1016/j.jbi.2017.08.001. Epub 2017 Aug 7.

SBLC: a hybrid model for disease named entity recognition based on semantic bidirectional LSTMs and conditional random fields.

BMC Med Inform Decis Mak. 2018 Dec 7;18(Suppl 5):114. doi: 10.1186/s12911-018-0690-y.

Clinical Named Entity Recognition From Chinese Electronic Health Records via Machine Learning Methods.

JMIR Med Inform. 2018 Dec 17;6(4):e50. doi: 10.2196/medinform.9965.

Medical Named Entity Extraction from Chinese Resident Admit Notes Using Character and Word Attention-Enhanced Neural Network.

Int J Environ Res Public Health. 2020 Mar 2;17(5):1614. doi: 10.3390/ijerph17051614.

Combine Factual Medical Knowledge and Distributed Word Representation to Improve Clinical Named Entity Recognition.

AMIA Annu Symp Proc. 2018 Dec 5;2018:1110-1117. eCollection 2018.

Extracting comprehensive clinical information for breast cancer using deep learning methods.

Int J Med Inform. 2019 Dec;132:103985. doi: 10.1016/j.ijmedinf.2019.103985. Epub 2019 Oct 2.

引用本文的文献

Comparative Analysis of Large Language Models in Chinese Medical Named Entity Recognition.

Bioengineering (Basel). 2024 Sep 29;11(10):982. doi: 10.3390/bioengineering11100982.

Text mining of verbal autopsy narratives to extract mortality causes and most prevalent diseases using natural language processing.

PLoS One. 2024 Sep 19;19(9):e0308452. doi: 10.1371/journal.pone.0308452. eCollection 2024.

A transformer-based diffusion probabilistic model for heart rate and blood pressure forecasting in Intensive Care Unit.

Comput Methods Programs Biomed. 2024 Apr;246:108060. doi: 10.1016/j.cmpb.2024.108060. Epub 2024 Feb 8.

Extracting Clinical Information From Japanese Radiology Reports Using a 2-Stage Deep Learning Approach: Algorithm Development and Validation.

JMIR Med Inform. 2023 Nov 14;11:e49041. doi: 10.2196/49041.

CLART: A cascaded lattice-and-radical transformer network for Chinese medical named entity recognition.

Heliyon. 2023 Oct 10;9(10):e20692. doi: 10.1016/j.heliyon.2023.e20692. eCollection 2023 Oct.

A cross-institutional evaluation on breast cancer phenotyping NLP algorithms on electronic health records.

Comput Struct Biotechnol J. 2023 Aug 22;22:32-40. doi: 10.1016/j.csbj.2023.08.018. eCollection 2023.

Clinical concept and relation extraction using prompt-based machine reading comprehension.

J Am Med Inform Assoc. 2023 Aug 18;30(9):1486-1493. doi: 10.1093/jamia/ocad107.

Development and clinical application of an electronic health record quality control system for pulmonary aspergillosis based on guidelines and natural language processing technology.

J Thorac Dis. 2022 Sep;14(9):3398-3407. doi: 10.21037/jtd-22-532.

Automated Drug Coding Using Artificial Intelligence: An Evaluation of WHODrug Koda on Adverse Event Reports.

Drug Saf. 2022 May;45(5):549-561. doi: 10.1007/s40264-022-01162-7. Epub 2022 May 17.

Named Entity Recognition of Medical Text Based on the Deep Neural Network.

J Healthc Eng. 2022 Mar 7;2022:3990563. doi: 10.1155/2022/3990563. eCollection 2022.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

用于健康领域命名实体识别的具有专用词嵌入的递归神经网络。

Recurrent neural networks with specialized word embeddings for health-domain named-entity recognition.

机构信息

出版信息

BACKGROUND

OBJECTIVES

METHODS

RESULTS

CONCLUSIONS

背景

目的

方法

结果

结论

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献