• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

生物 BBC:一种增强生物医学实体检测的多特征模型。

BioBBC: a multi-feature model that enhances the detection of biomedical entities.

机构信息

Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia.

Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia.

出版信息

Sci Rep. 2024 Apr 2;14(1):7697. doi: 10.1038/s41598-024-58334-x.

DOI:10.1038/s41598-024-58334-x
PMID:38565624
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10987643/
Abstract

The rapid increase in biomedical publications necessitates efficient systems to automatically handle Biomedical Named Entity Recognition (BioNER) tasks in unstructured text. However, accurately detecting biomedical entities is quite challenging due to the complexity of their names and the frequent use of abbreviations. In this paper, we propose BioBBC, a deep learning (DL) model that utilizes multi-feature embeddings and is constructed based on the BERT-BiLSTM-CRF to address the BioNER task. BioBBC consists of three main layers; an embedding layer, a Long Short-Term Memory (Bi-LSTM) layer, and a Conditional Random Fields (CRF) layer. BioBBC takes sentences from the biomedical domain as input and identifies the biomedical entities mentioned within the text. The embedding layer generates enriched contextual representation vectors of the input by learning the text through four types of embeddings: part-of-speech tags (POS tags) embedding, char-level embedding, BERT embedding, and data-specific embedding. The BiLSTM layer produces additional syntactic and semantic feature representations. Finally, the CRF layer identifies the best possible tag sequence for the input sentence. Our model is well-constructed and well-optimized for detecting different types of biomedical entities. Based on experimental results, our model outperformed state-of-the-art (SOTA) models with significant improvements based on six benchmark BioNER datasets.

摘要

生物医学出版物的快速增长需要高效的系统来自动处理非结构化文本中的生物医学命名实体识别 (BioNER) 任务。然而,由于生物医学实体名称的复杂性和缩写的频繁使用,准确地检测生物医学实体是相当具有挑战性的。在本文中,我们提出了 BioBBC,这是一个基于 BERT-BiLSTM-CRF 构建的利用多特征嵌入的深度学习 (DL) 模型,用于解决 BioNER 任务。BioBBC 由三个主要层组成;嵌入层、长短期记忆 (Bi-LSTM) 层和条件随机场 (CRF) 层。BioBBC 以生物医学领域的句子为输入,并识别文本中提到的生物医学实体。嵌入层通过学习四种类型的嵌入(词性标签 (POS) 嵌入、字符级嵌入、BERT 嵌入和特定于数据的嵌入)来生成输入的丰富上下文表示向量。BiLSTM 层生成额外的语法和语义特征表示。最后,CRF 层识别输入句子的最佳可能标签序列。我们的模型是为检测不同类型的生物医学实体而精心构建和优化的。基于实验结果,我们的模型在六个基准 BioNER 数据集上的表现优于最先进的 (SOTA) 模型,并取得了显著的改进。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ef54/10987643/220c4321bb6c/41598_2024_58334_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ef54/10987643/20d135464a86/41598_2024_58334_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ef54/10987643/f0d9f9886790/41598_2024_58334_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ef54/10987643/ed020886f925/41598_2024_58334_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ef54/10987643/220c4321bb6c/41598_2024_58334_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ef54/10987643/20d135464a86/41598_2024_58334_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ef54/10987643/f0d9f9886790/41598_2024_58334_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ef54/10987643/ed020886f925/41598_2024_58334_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ef54/10987643/220c4321bb6c/41598_2024_58334_Fig4_HTML.jpg

相似文献

1
BioBBC: a multi-feature model that enhances the detection of biomedical entities.生物 BBC:一种增强生物医学实体检测的多特征模型。
Sci Rep. 2024 Apr 2;14(1):7697. doi: 10.1038/s41598-024-58334-x.
2
BioByGANS: biomedical named entity recognition by fusing contextual and syntactic features through graph attention network in node classification framework.BioByGANS:通过图注意力网络在节点分类框架中融合上下文和句法特征进行生物医学命名实体识别。
BMC Bioinformatics. 2022 Nov 22;23(1):501. doi: 10.1186/s12859-022-05051-9.
3
Extracting clinical named entity for pituitary adenomas from Chinese electronic medical records.从中文电子病历中提取垂体腺瘤的临床命名实体。
BMC Med Inform Decis Mak. 2022 Mar 23;22(1):72. doi: 10.1186/s12911-022-01810-z.
4
Biomedical named entity recognition based on fusion multi-features embedding.基于融合多特征嵌入的生物医学命名实体识别。
Technol Health Care. 2023;31(S1):111-121. doi: 10.3233/THC-236011.
5
A deep learning model incorporating part of speech and self-matching attention for named entity recognition of Chinese electronic medical records.基于词性和自匹配注意力的深度学习模型在中文电子病历命名实体识别中的应用。
BMC Med Inform Decis Mak. 2019 Apr 9;19(Suppl 2):65. doi: 10.1186/s12911-019-0762-7.
6
Comparing general and specialized word embeddings for biomedical named entity recognition.比较用于生物医学命名实体识别的通用词嵌入和专用词嵌入。
PeerJ Comput Sci. 2021 Feb 18;7:e384. doi: 10.7717/peerj-cs.384. eCollection 2021.
7
Combinatorial feature embedding based on CNN and LSTM for biomedical named entity recognition.基于 CNN 和 LSTM 的组合特征嵌入的生物医学命名实体识别。
J Biomed Inform. 2020 Mar;103:103381. doi: 10.1016/j.jbi.2020.103381. Epub 2020 Jan 28.
8
Improving biomedical Named Entity Recognition with additional external contexts.利用额外的外部语境提高生物医学命名实体识别的性能。
J Biomed Inform. 2024 Aug;156:104674. doi: 10.1016/j.jbi.2024.104674. Epub 2024 Jun 11.
9
Extracting comprehensive clinical information for breast cancer using deep learning methods.利用深度学习方法提取乳腺癌全面临床信息。
Int J Med Inform. 2019 Dec;132:103985. doi: 10.1016/j.ijmedinf.2019.103985. Epub 2019 Oct 2.
10
Analyzing transfer learning impact in biomedical cross-lingual named entity recognition and normalization.分析迁移学习在生物医学跨语言命名实体识别和标准化中的影响。
BMC Bioinformatics. 2021 Dec 17;22(Suppl 1):601. doi: 10.1186/s12859-021-04247-9.

引用本文的文献

1
Psychomedical named entity recognition method based on multi-level feature extraction and multi-granularity embedding fusion.基于多层次特征提取与多粒度嵌入融合的精神医学命名实体识别方法
Sci Rep. 2025 May 15;15(1):16927. doi: 10.1038/s41598-025-90939-8.
2
Biomedical named entity recognition using improved green anaconda-assisted Bi-GRU-based hierarchical ResNet model.使用改进的绿色蟒蛇辅助的基于双向门控循环单元的分层残差神经网络模型进行生物医学命名实体识别。
BMC Bioinformatics. 2025 Jan 30;26(1):34. doi: 10.1186/s12859-024-06008-w.

本文引用的文献

1
Knowledge Adaptive Multi-Way Matching Network for Biomedical Named Entity Recognition via Machine Reading Comprehension.基于机器阅读理解的知识自适应多向匹配网络在生物医学命名实体识别中的应用。
IEEE/ACM Trans Comput Biol Bioinform. 2023 May-Jun;20(3):2101-2111. doi: 10.1109/TCBB.2022.3233856. Epub 2023 Jun 5.
2
A prefix and attention map discrimination fusion guided attention for biomedical named entity recognition.前缀和注意力图判别融合引导的生物医学命名实体识别注意力机制。
BMC Bioinformatics. 2023 Feb 8;24(1):42. doi: 10.1186/s12859-023-05172-9.
3
Exploring the effects of drug, disease, and protein dependencies on biomedical named entity recognition: A comparative analysis.
探索药物、疾病和蛋白质依赖性对生物医学命名实体识别的影响:一项比较分析。
Front Pharmacol. 2022 Dec 21;13:1020759. doi: 10.3389/fphar.2022.1020759. eCollection 2022.
4
BioByGANS: biomedical named entity recognition by fusing contextual and syntactic features through graph attention network in node classification framework.BioByGANS:通过图注意力网络在节点分类框架中融合上下文和句法特征进行生物医学命名实体识别。
BMC Bioinformatics. 2022 Nov 22;23(1):501. doi: 10.1186/s12859-022-05051-9.
5
Biomedical named entity recognition with the combined feature attention and fully-shared multi-task learning.基于联合特征注意力和全共享多任务学习的生物医学命名实体识别。
BMC Bioinformatics. 2022 Nov 3;23(1):458. doi: 10.1186/s12859-022-04994-3.
6
Hierarchical shared transfer learning for biomedical named entity recognition.基于层次共享迁移学习的生物医学命名实体识别。
BMC Bioinformatics. 2022 Jan 4;23(1):8. doi: 10.1186/s12859-021-04551-4.
7
Biomedical named entity recognition using BERT in the machine reading comprehension framework.基于机器阅读理解框架的 BERT 在生物医学命名实体识别中的应用。
J Biomed Inform. 2021 Jun;118:103799. doi: 10.1016/j.jbi.2021.103799. Epub 2021 May 6.
8
DTranNER: biomedical named entity recognition with deep learning-based label-label transition model.DTranNER:基于深度学习的标签-标签转换模型的生物医学命名实体识别。
BMC Bioinformatics. 2020 Feb 11;21(1):53. doi: 10.1186/s12859-020-3393-1.
9
BioBERT: a pre-trained biomedical language representation model for biomedical text mining.BioBERT:一种用于生物医学文本挖掘的预训练生物医学语言表示模型。
Bioinformatics. 2020 Feb 15;36(4):1234-1240. doi: 10.1093/bioinformatics/btz682.
10
Named Entity Recognition and Normalization Applied to Large-Scale Information Extraction from the Materials Science Literature.命名实体识别和规范化在材料科学文献的大规模信息抽取中的应用。
J Chem Inf Model. 2019 Sep 23;59(9):3692-3702. doi: 10.1021/acs.jcim.9b00470. Epub 2019 Aug 19.