基于 CNN 和 LSTM 的组合特征嵌入的生物医学命名实体识别。

Combinatorial feature embedding based on CNN and LSTM for biomedical named entity recognition.

机构信息

Yonsei University, Department of Computer Science, Republic of Korea.

出版信息

J Biomed Inform. 2020 Mar;103:103381. doi: 10.1016/j.jbi.2020.103381. Epub 2020 Jan 28.

DOI:10.1016/j.jbi.2020.103381

PMID:32004641

Abstract

With the rapid advancement of technology and the necessity of processing large amounts of data, biomedical Named Entity Recognition (NER) has become an essential technique for information extraction in the biomedical field. NER, which is a sequence-labeling task, has been performed using various traditional techniques including dictionary-, rule-, machine learning-, and deep learning-based methods. However, as existing biomedical NER models are insufficient to handle new and unseen entity types from the growing biomedical data, the development of more effective and accurate biomedical NER models is being widely researched. Among biomedical NER models utilizing deep learning approaches, there have been only a few studies involving the design of high-level features in the embedding layer. In this regard, herein, we propose a deep learning NER model that effectively represents biomedical word tokens through the design of a combinatorial feature embedding. The proposed model is based on Bidirectional Long Short-Term Memory (bi-LSTM) with Conditional Random Field (CRF) and enhanced by integrating two different character-level representations extracted from a Convolutional Neural Network (CNN) and bi-LSTM. Additionally, an attention mechanism is applied to the model to focus on the relevant tokens in the sentence, which alleviates the long-term dependency problem of the LSTM model and allows effective recognition of entities. The proposed model was evaluated on two benchmark datasets, the JNLPBA and NCBI-Disease, and a comparative analysis with the existing models is performed. The proposed model achieved a relatively higher performance with an F1-score of 86.93% in case of NCBI-Disease, and a competitive performance for the JNLPBA with an F1-score of 75.31%.

摘要

随着技术的快速发展和处理大量数据的必要性，生物医学命名实体识别 (NER) 已成为生物医学领域信息提取的一项关键技术。NER 是一项序列标记任务，已经使用了各种传统技术进行了处理，包括基于字典、规则、机器学习和深度学习的方法。然而，由于现有的生物医学 NER 模型不足以处理来自不断增长的生物医学数据中新的和未见过的实体类型，因此正在广泛研究开发更有效和准确的生物医学 NER 模型。在利用深度学习方法的生物医学 NER 模型中，只有少数研究涉及到在嵌入层中设计高级特征。在这方面，本文提出了一种深度学习 NER 模型，通过设计组合特征嵌入来有效地表示生物医学单词标记。所提出的模型基于带有条件随机场 (CRF) 的双向长短时记忆网络 (bi-LSTM)，并通过集成从卷积神经网络 (CNN) 和 bi-LSTM 中提取的两种不同的字符级表示来增强。此外，还将注意力机制应用于模型，以关注句子中的相关标记，这缓解了 LSTM 模型的长期依赖问题，并允许有效地识别实体。在所提出的模型在两个基准数据集 JNLPBA 和 NCBI-Disease 上进行了评估，并与现有模型进行了比较分析。在所提出的模型在 NCBI-Disease 上的 F1 得分为 86.93%，在 JNLPBA 上的 F1 得分为 75.31%，具有相对较高的性能，表现出了竞争性能。

相似文献

Combinatorial feature embedding based on CNN and LSTM for biomedical named entity recognition.基于 CNN 和 LSTM 的组合特征嵌入的生物医学命名实体识别。

J Biomed Inform. 2020 Mar;103:103381. doi: 10.1016/j.jbi.2020.103381. Epub 2020 Jan 28.

A deep learning model incorporating part of speech and self-matching attention for named entity recognition of Chinese electronic medical records.基于词性和自匹配注意力的深度学习模型在中文电子病历命名实体识别中的应用。

BMC Med Inform Decis Mak. 2019 Apr 9;19(Suppl 2):65. doi: 10.1186/s12911-019-0762-7.

Character level and word level embedding with bidirectional LSTM - Dynamic recurrent neural network for biomedical named entity recognition from literature.基于字符和词的双向 LSTM 嵌入 - 用于从文献中识别生物医学命名实体的动态递归神经网络。

J Biomed Inform. 2020 Dec;112:103609. doi: 10.1016/j.jbi.2020.103609. Epub 2020 Oct 26.

Biomedical named entity recognition using deep neural networks with contextual information.基于上下文信息的深度神经网络的生物医学命名实体识别。

BMC Bioinformatics. 2019 Dec 27;20(1):735. doi: 10.1186/s12859-019-3321-4.

Medical Named Entity Extraction from Chinese Resident Admit Notes Using Character and Word Attention-Enhanced Neural Network.基于字符和词注意力增强神经网络的中文住院病案中医学命名实体抽取

Int J Environ Res Public Health. 2020 Mar 2;17(5):1614. doi: 10.3390/ijerph17051614.

Long short-term memory RNN for biomedical named entity recognition.用于生物医学命名实体识别的长短期记忆循环神经网络

BMC Bioinformatics. 2017 Oct 30;18(1):462. doi: 10.1186/s12859-017-1868-5.

Ontology-Based Healthcare Named Entity Recognition from Twitter Messages Using a Recurrent Neural Network Approach.基于本体的推特消息中医疗命名实体识别的递归神经网络方法。

Int J Environ Res Public Health. 2019 Sep 27;16(19):3628. doi: 10.3390/ijerph16193628.

Entity recognition from clinical texts via recurrent neural network.基于循环神经网络的临床文本实体识别。

BMC Med Inform Decis Mak. 2017 Jul 5;17(Suppl 2):67. doi: 10.1186/s12911-017-0468-7.

Negation-based transfer learning for improving biomedical Named Entity Recognition and Relation Extraction.基于否定的迁移学习用于改进生物医学命名实体识别和关系抽取。

J Biomed Inform. 2023 Feb;138:104279. doi: 10.1016/j.jbi.2022.104279. Epub 2023 Jan 4.

Entity recognition in Chinese clinical text using attention-based CNN-LSTM-CRF.基于注意力机制的卷积神经网络-长短时记忆网络-条件随机场在中文临床文本中的实体识别。

BMC Med Inform Decis Mak. 2019 Apr 4;19(Suppl 3):74. doi: 10.1186/s12911-019-0787-y.

引用本文的文献

Sentences, entities, and keyphrases extraction from consumer health forums using multi-task learning.使用多任务学习从消费者健康论坛中提取句子、实体和关键短语。

J Biomed Semantics. 2025 May 6;16(1):8. doi: 10.1186/s13326-025-00329-2.

Few-shot biomedical NER empowered by LLMs-assisted data augmentation and multi-scale feature extraction.由大语言模型辅助数据增强和多尺度特征提取赋能的少样本生物医学命名实体识别

BioData Min. 2025 Apr 4;18(1):28. doi: 10.1186/s13040-025-00443-y.

DeepWalk-Based Graph Embeddings for miRNA-Disease Association Prediction Using Deep Neural Network.基于深度游走的图嵌入用于使用深度神经网络进行miRNA-疾病关联预测

Biomedicines. 2025 Feb 20;13(3):536. doi: 10.3390/biomedicines13030536.

Graph Convolutional Network with Neural Collaborative Filtering for Predicting miRNA-Disease Association.基于神经协同过滤的图卷积网络用于预测miRNA-疾病关联

Biomedicines. 2025 Jan 8;13(1):136. doi: 10.3390/biomedicines13010136.

Attention-based interactive multi-level feature fusion for named entity recognition.基于注意力的交互式多层次特征融合用于命名实体识别。

Sci Rep. 2025 Jan 24;15(1):3069. doi: 10.1038/s41598-025-86718-0.

Biomedical named entity recognition based on multi-cross attention feature fusion.基于多交叉注意力特征融合的生物医学命名实体识别。

PLoS One. 2024 May 28;19(5):e0304329. doi: 10.1371/journal.pone.0304329. eCollection 2024.

Vocabulary Matters: An Annotation Pipeline and Four Deep Learning Algorithms for Enzyme Named Entity Recognition.词汇很重要：用于酶命名实体识别的标注流水线和四个深度学习算法。

J Proteome Res. 2024 Jun 7;23(6):1915-1925. doi: 10.1021/acs.jproteome.3c00367. Epub 2024 May 11.

A deep learning approach for Named Entity Recognition in Urdu language.一种用于乌尔都语命名实体识别的深度学习方法。

PLoS One. 2024 Mar 28;19(3):e0300725. doi: 10.1371/journal.pone.0300725. eCollection 2024.

A Review on Electronic Health Record Text-Mining for Biomedical Name Entity Recognition in Healthcare Domain.医疗领域中用于生物医学命名实体识别的电子健康记录文本挖掘综述

Healthcare (Basel). 2023 Apr 28;11(9):1268. doi: 10.3390/healthcare11091268.

Clinical concept recognition: Evaluation of existing systems on EHRs.临床概念识别：对电子健康记录现有系统的评估。

Front Artif Intell. 2023 Jan 13;5:1051724. doi: 10.3389/frai.2022.1051724. eCollection 2022.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于 CNN 和 LSTM 的组合特征嵌入的生物医学命名实体识别。

Combinatorial feature embedding based on CNN and LSTM for biomedical named entity recognition.

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献