• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于机器阅读理解框架的 BERT 在生物医学命名实体识别中的应用。

Biomedical named entity recognition using BERT in the machine reading comprehension framework.

机构信息

School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China.

School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China.

出版信息

J Biomed Inform. 2021 Jun;118:103799. doi: 10.1016/j.jbi.2021.103799. Epub 2021 May 6.

DOI:10.1016/j.jbi.2021.103799
PMID:33965638
Abstract

Recognition of biomedical entities from literature is a challenging research focus, which is the foundation for extracting a large amount of biomedical knowledge existing in unstructured texts into structured formats. Using the sequence labeling framework to implement biomedical named entity recognition (BioNER) is currently a conventional method. This method, however, often cannot take full advantage of the semantic information in the dataset, and the performance is not always satisfactory. In this work, instead of treating the BioNER task as a sequence labeling problem, we formulate it as a machine reading comprehension (MRC) problem. This formulation can introduce more prior knowledge utilizing well-designed queries, and no longer need decoding processes such as conditional random fields (CRF). We conduct experiments on six BioNER datasets, and the experimental results demonstrate the effectiveness of our method. Our method achieves state-of-the-art (SOTA) performance on the BC4CHEMD, BC5CDR-Chem, BC5CDR-Disease, NCBI-Disease, BC2GM and JNLPBA datasets, achieving F1-scores of 92.92%, 94.19%, 87.83%, 90.04%, 85.48% and 78.93%, respectively.

摘要

从文献中识别生物医学实体是一个具有挑战性的研究重点,这是将大量存在于非结构化文本中的生物医学知识提取到结构化格式中的基础。使用序列标注框架来实现生物医学命名实体识别 (BioNER) 是目前的一种常规方法。然而,这种方法通常不能充分利用数据集中的语义信息,性能并不总是令人满意。在这项工作中,我们不是将 BioNER 任务视为序列标注问题,而是将其表述为机器阅读理解 (MRC) 问题。这种表述可以利用精心设计的查询引入更多的先验知识,而不再需要条件随机场 (CRF) 等解码过程。我们在六个 BioNER 数据集上进行了实验,实验结果证明了我们方法的有效性。我们的方法在 BC4CHEMD、BC5CDR-Chem、BC5CDR-Disease、NCBI-Disease、BC2GM 和 JNLPBA 数据集上实现了最先进的 (SOTA) 性能,分别达到了 92.92%、94.19%、87.83%、90.04%、85.48%和 78.93%的 F1 得分。

相似文献

1
Biomedical named entity recognition using BERT in the machine reading comprehension framework.基于机器阅读理解框架的 BERT 在生物医学命名实体识别中的应用。
J Biomed Inform. 2021 Jun;118:103799. doi: 10.1016/j.jbi.2021.103799. Epub 2021 May 6.
2
BioByGANS: biomedical named entity recognition by fusing contextual and syntactic features through graph attention network in node classification framework.BioByGANS:通过图注意力网络在节点分类框架中融合上下文和句法特征进行生物医学命名实体识别。
BMC Bioinformatics. 2022 Nov 22;23(1):501. doi: 10.1186/s12859-022-05051-9.
3
DTranNER: biomedical named entity recognition with deep learning-based label-label transition model.DTranNER:基于深度学习的标签-标签转换模型的生物医学命名实体识别。
BMC Bioinformatics. 2020 Feb 11;21(1):53. doi: 10.1186/s12859-020-3393-1.
4
A prefix and attention map discrimination fusion guided attention for biomedical named entity recognition.前缀和注意力图判别融合引导的生物医学命名实体识别注意力机制。
BMC Bioinformatics. 2023 Feb 8;24(1):42. doi: 10.1186/s12859-023-05172-9.
5
Application of machine reading comprehension techniques for named entity recognition in materials science.机器阅读理解技术在材料科学中用于命名实体识别的应用
J Cheminform. 2024 Jul 2;16(1):76. doi: 10.1186/s13321-024-00874-5.
6
Dictionary-based matching graph network for biomedical named entity recognition.基于词典匹配图网络的生物医学命名实体识别。
Sci Rep. 2023 Dec 8;13(1):21667. doi: 10.1038/s41598-023-48564-w.
7
Improving biomedical named entity recognition with syntactic information.利用句法信息提高生物医学命名实体识别。
BMC Bioinformatics. 2020 Nov 25;21(1):539. doi: 10.1186/s12859-020-03834-6.
8
Hierarchical shared transfer learning for biomedical named entity recognition.基于层次共享迁移学习的生物医学命名实体识别。
BMC Bioinformatics. 2022 Jan 4;23(1):8. doi: 10.1186/s12859-021-04551-4.
9
Biomedical named entity recognition with the combined feature attention and fully-shared multi-task learning.基于联合特征注意力和全共享多任务学习的生物医学命名实体识别。
BMC Bioinformatics. 2022 Nov 3;23(1):458. doi: 10.1186/s12859-022-04994-3.
10
Exploring the effects of drug, disease, and protein dependencies on biomedical named entity recognition: A comparative analysis.探索药物、疾病和蛋白质依赖性对生物医学命名实体识别的影响:一项比较分析。
Front Pharmacol. 2022 Dec 21;13:1020759. doi: 10.3389/fphar.2022.1020759. eCollection 2022.

引用本文的文献

1
A diffusion enhanced CRF and BiLSTM framework for accurate entity recognition.一种用于精确实体识别的扩散增强条件随机场和双向长短期记忆网络框架。
Sci Rep. 2025 Jun 4;15(1):19670. doi: 10.1038/s41598-025-04036-x.
2
Large Language Models in Biomedical and Health Informatics: A Review with Bibliometric Analysis.生物医学与健康信息学中的大语言模型:文献计量分析综述
J Healthc Inform Res. 2024 Sep 14;8(4):658-711. doi: 10.1007/s41666-024-00171-8. eCollection 2024 Dec.
3
Transformer models in biomedicine.生物医学中的 Transformer 模型。
BMC Med Inform Decis Mak. 2024 Jul 29;24(1):214. doi: 10.1186/s12911-024-02600-5.
4
Application of machine reading comprehension techniques for named entity recognition in materials science.机器阅读理解技术在材料科学中用于命名实体识别的应用
J Cheminform. 2024 Jul 2;16(1):76. doi: 10.1186/s13321-024-00874-5.
5
TaeC: A manually annotated text dataset for trait and phenotype extraction and entity linking in wheat breeding literature.TaeC:一个用于小麦育种文献中性状和表型提取以及实体链接的人工注释文本数据集。
PLoS One. 2024 Jun 13;19(6):e0305475. doi: 10.1371/journal.pone.0305475. eCollection 2024.
6
Vocabulary Matters: An Annotation Pipeline and Four Deep Learning Algorithms for Enzyme Named Entity Recognition.词汇很重要:用于酶命名实体识别的标注流水线和四个深度学习算法。
J Proteome Res. 2024 Jun 7;23(6):1915-1925. doi: 10.1021/acs.jproteome.3c00367. Epub 2024 May 11.
7
BioBBC: a multi-feature model that enhances the detection of biomedical entities.生物 BBC:一种增强生物医学实体检测的多特征模型。
Sci Rep. 2024 Apr 2;14(1):7697. doi: 10.1038/s41598-024-58334-x.
8
A review on Natural Language Processing Models for COVID-19 research.关于用于新冠病毒研究的自然语言处理模型的综述。
Healthc Anal (N Y). 2022 Nov;2:100078. doi: 10.1016/j.health.2022.100078. Epub 2022 Jul 19.
9
Nested Named Entity Recognition Based on Dual Stream Feature Complementation.基于双流特征互补的嵌套命名实体识别
Entropy (Basel). 2022 Oct 12;24(10):1454. doi: 10.3390/e24101454.
10
Clinical concept and relation extraction using prompt-based machine reading comprehension.基于提示的机器阅读理解的临床概念和关系抽取。
J Am Med Inform Assoc. 2023 Aug 18;30(9):1486-1493. doi: 10.1093/jamia/ocad107.