生物 BBC：一种增强生物医学实体检测的多特征模型。

BioBBC: a multi-feature model that enhances the detection of biomedical entities.

机构信息

Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia.

Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia.

出版信息

Sci Rep. 2024 Apr 2;14(1):7697. doi: 10.1038/s41598-024-58334-x.

DOI:10.1038/s41598-024-58334-x

PMID:38565624

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10987643/

Abstract

The rapid increase in biomedical publications necessitates efficient systems to automatically handle Biomedical Named Entity Recognition (BioNER) tasks in unstructured text. However, accurately detecting biomedical entities is quite challenging due to the complexity of their names and the frequent use of abbreviations. In this paper, we propose BioBBC, a deep learning (DL) model that utilizes multi-feature embeddings and is constructed based on the BERT-BiLSTM-CRF to address the BioNER task. BioBBC consists of three main layers; an embedding layer, a Long Short-Term Memory (Bi-LSTM) layer, and a Conditional Random Fields (CRF) layer. BioBBC takes sentences from the biomedical domain as input and identifies the biomedical entities mentioned within the text. The embedding layer generates enriched contextual representation vectors of the input by learning the text through four types of embeddings: part-of-speech tags (POS tags) embedding, char-level embedding, BERT embedding, and data-specific embedding. The BiLSTM layer produces additional syntactic and semantic feature representations. Finally, the CRF layer identifies the best possible tag sequence for the input sentence. Our model is well-constructed and well-optimized for detecting different types of biomedical entities. Based on experimental results, our model outperformed state-of-the-art (SOTA) models with significant improvements based on six benchmark BioNER datasets.

摘要

生物医学出版物的快速增长需要高效的系统来自动处理非结构化文本中的生物医学命名实体识别 (BioNER) 任务。然而，由于生物医学实体名称的复杂性和缩写的频繁使用，准确地检测生物医学实体是相当具有挑战性的。在本文中，我们提出了 BioBBC，这是一个基于 BERT-BiLSTM-CRF 构建的利用多特征嵌入的深度学习 (DL) 模型，用于解决 BioNER 任务。BioBBC 由三个主要层组成；嵌入层、长短期记忆 (Bi-LSTM) 层和条件随机场 (CRF) 层。BioBBC 以生物医学领域的句子为输入，并识别文本中提到的生物医学实体。嵌入层通过学习四种类型的嵌入（词性标签 (POS) 嵌入、字符级嵌入、BERT 嵌入和特定于数据的嵌入）来生成输入的丰富上下文表示向量。BiLSTM 层生成额外的语法和语义特征表示。最后，CRF 层识别输入句子的最佳可能标签序列。我们的模型是为检测不同类型的生物医学实体而精心构建和优化的。基于实验结果，我们的模型在六个基准 BioNER 数据集上的表现优于最先进的 (SOTA) 模型，并取得了显著的改进。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ef54/10987643/20d135464a86/41598_2024_58334_Fig1_HTML.jpg

相似文献

BioBBC: a multi-feature model that enhances the detection of biomedical entities.生物 BBC：一种增强生物医学实体检测的多特征模型。

Sci Rep. 2024 Apr 2;14(1):7697. doi: 10.1038/s41598-024-58334-x.

BioByGANS: biomedical named entity recognition by fusing contextual and syntactic features through graph attention network in node classification framework.BioByGANS：通过图注意力网络在节点分类框架中融合上下文和句法特征进行生物医学命名实体识别。

BMC Bioinformatics. 2022 Nov 22;23(1):501. doi: 10.1186/s12859-022-05051-9.

Extracting clinical named entity for pituitary adenomas from Chinese electronic medical records.从中文电子病历中提取垂体腺瘤的临床命名实体。

BMC Med Inform Decis Mak. 2022 Mar 23;22(1):72. doi: 10.1186/s12911-022-01810-z.

Biomedical named entity recognition based on fusion multi-features embedding.基于融合多特征嵌入的生物医学命名实体识别。

Technol Health Care. 2023;31(S1):111-121. doi: 10.3233/THC-236011.

A deep learning model incorporating part of speech and self-matching attention for named entity recognition of Chinese electronic medical records.基于词性和自匹配注意力的深度学习模型在中文电子病历命名实体识别中的应用。

BMC Med Inform Decis Mak. 2019 Apr 9;19(Suppl 2):65. doi: 10.1186/s12911-019-0762-7.

Comparing general and specialized word embeddings for biomedical named entity recognition.比较用于生物医学命名实体识别的通用词嵌入和专用词嵌入。

PeerJ Comput Sci. 2021 Feb 18;7:e384. doi: 10.7717/peerj-cs.384. eCollection 2021.

Combinatorial feature embedding based on CNN and LSTM for biomedical named entity recognition.基于 CNN 和 LSTM 的组合特征嵌入的生物医学命名实体识别。

J Biomed Inform. 2020 Mar;103:103381. doi: 10.1016/j.jbi.2020.103381. Epub 2020 Jan 28.

Improving biomedical Named Entity Recognition with additional external contexts.利用额外的外部语境提高生物医学命名实体识别的性能。

J Biomed Inform. 2024 Aug;156:104674. doi: 10.1016/j.jbi.2024.104674. Epub 2024 Jun 11.

Extracting comprehensive clinical information for breast cancer using deep learning methods.利用深度学习方法提取乳腺癌全面临床信息。

Int J Med Inform. 2019 Dec;132:103985. doi: 10.1016/j.ijmedinf.2019.103985. Epub 2019 Oct 2.

Analyzing transfer learning impact in biomedical cross-lingual named entity recognition and normalization.分析迁移学习在生物医学跨语言命名实体识别和标准化中的影响。

BMC Bioinformatics. 2021 Dec 17;22(Suppl 1):601. doi: 10.1186/s12859-021-04247-9.

引用本文的文献

Psychomedical named entity recognition method based on multi-level feature extraction and multi-granularity embedding fusion.基于多层次特征提取与多粒度嵌入融合的精神医学命名实体识别方法

Sci Rep. 2025 May 15;15(1):16927. doi: 10.1038/s41598-025-90939-8.

Biomedical named entity recognition using improved green anaconda-assisted Bi-GRU-based hierarchical ResNet model.使用改进的绿色蟒蛇辅助的基于双向门控循环单元的分层残差神经网络模型进行生物医学命名实体识别。

BMC Bioinformatics. 2025 Jan 30;26(1):34. doi: 10.1186/s12859-024-06008-w.

本文引用的文献

Knowledge Adaptive Multi-Way Matching Network for Biomedical Named Entity Recognition via Machine Reading Comprehension.基于机器阅读理解的知识自适应多向匹配网络在生物医学命名实体识别中的应用。

IEEE/ACM Trans Comput Biol Bioinform. 2023 May-Jun;20(3):2101-2111. doi: 10.1109/TCBB.2022.3233856. Epub 2023 Jun 5.

A prefix and attention map discrimination fusion guided attention for biomedical named entity recognition.前缀和注意力图判别融合引导的生物医学命名实体识别注意力机制。

BMC Bioinformatics. 2023 Feb 8;24(1):42. doi: 10.1186/s12859-023-05172-9.

Exploring the effects of drug, disease, and protein dependencies on biomedical named entity recognition: A comparative analysis.探索药物、疾病和蛋白质依赖性对生物医学命名实体识别的影响：一项比较分析。

Front Pharmacol. 2022 Dec 21;13:1020759. doi: 10.3389/fphar.2022.1020759. eCollection 2022.

BMC Bioinformatics. 2022 Nov 22;23(1):501. doi: 10.1186/s12859-022-05051-9.

Biomedical named entity recognition with the combined feature attention and fully-shared multi-task learning.基于联合特征注意力和全共享多任务学习的生物医学命名实体识别。

BMC Bioinformatics. 2022 Nov 3;23(1):458. doi: 10.1186/s12859-022-04994-3.

Hierarchical shared transfer learning for biomedical named entity recognition.基于层次共享迁移学习的生物医学命名实体识别。

BMC Bioinformatics. 2022 Jan 4;23(1):8. doi: 10.1186/s12859-021-04551-4.

Biomedical named entity recognition using BERT in the machine reading comprehension framework.基于机器阅读理解框架的 BERT 在生物医学命名实体识别中的应用。

J Biomed Inform. 2021 Jun;118:103799. doi: 10.1016/j.jbi.2021.103799. Epub 2021 May 6.

DTranNER: biomedical named entity recognition with deep learning-based label-label transition model.DTranNER：基于深度学习的标签-标签转换模型的生物医学命名实体识别。

BMC Bioinformatics. 2020 Feb 11;21(1):53. doi: 10.1186/s12859-020-3393-1.

BioBERT: a pre-trained biomedical language representation model for biomedical text mining.BioBERT：一种用于生物医学文本挖掘的预训练生物医学语言表示模型。

Bioinformatics. 2020 Feb 15;36(4):1234-1240. doi: 10.1093/bioinformatics/btz682.

Named Entity Recognition and Normalization Applied to Large-Scale Information Extraction from the Materials Science Literature.命名实体识别和规范化在材料科学文献的大规模信息抽取中的应用。

J Chem Inf Model. 2019 Sep 23;59(9):3692-3702. doi: 10.1021/acs.jcim.9b00470. Epub 2019 Aug 19.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

生物 BBC：一种增强生物医学实体检测的多特征模型。

BioBBC: a multi-feature model that enhances the detection of biomedical entities.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献