• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过动态缓存句间信息来改进生物医学命名实体识别。

Improving biomedical named entity recognition by dynamic caching inter-sentence information.

机构信息

Institute of Artificial Intelligence, Beihang University, Beijing 100191, China.

SKLSDE, School of Computer Science, Beihang University, Beijing 100191, China.

出版信息

Bioinformatics. 2022 Aug 10;38(16):3976-3983. doi: 10.1093/bioinformatics/btac422.

DOI:10.1093/bioinformatics/btac422
PMID:35758612
Abstract

MOTIVATION

Biomedical Named Entity Recognition (BioNER) aims to identify biomedical domain-specific entities (e.g. gene, chemical and disease) from unstructured texts. Despite deep learning-based methods for BioNER achieving satisfactory results, there is still much room for improvement. Firstly, most existing methods use independent sentences as training units and ignore inter-sentence context, which usually leads to the labeling inconsistency problem. Secondly, previous document-level BioNER works have approved that the inter-sentence information is essential, but what information should be regarded as context remains ambiguous. Moreover, there are still few pre-training-based BioNER models that have introduced inter-sentence information. Hence, we propose a cache-based inter-sentence model called BioNER-Cache to alleviate the aforementioned problems.

RESULTS

We propose a simple but effective dynamic caching module to capture inter-sentence information for BioNER. Specifically, the cache stores recent hidden representations constrained by predefined caching rules. And the model uses a query-and-read mechanism to retrieve similar historical records from the cache as the local context. Then, an attention-based gated network is adopted to generate context-related features with BioBERT. To dynamically update the cache, we design a scoring function and implement a multi-task approach to jointly train our model. We build a comprehensive benchmark on four biomedical datasets to evaluate the model performance fairly. Finally, extensive experiments clearly validate the superiority of our proposed BioNER-Cache compared with various state-of-the-art intra-sentence and inter-sentence baselines.

AVAILABILITYAND IMPLEMENTATION

Code will be available at https://github.com/zgzjdx/BioNER-Cache.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

生物医学命名实体识别(BioNER)旨在从非结构化文本中识别生物医学领域特定的实体(例如基因、化学物质和疾病)。尽管基于深度学习的 BioNER 方法取得了令人满意的结果,但仍有很大的改进空间。首先,大多数现有方法使用独立的句子作为训练单元,忽略句子之间的上下文,这通常会导致标签不一致问题。其次,以前的文档级 BioNER 工作已经证明句子之间的信息是必不可少的,但哪些信息应该被视为上下文仍然不清楚。此外,基于预训练的 BioNER 模型引入句子间信息的模型仍然很少。因此,我们提出了一种基于缓存的句子间模型,称为 BioNER-Cache,以缓解上述问题。

结果

我们提出了一种简单而有效的动态缓存模块,用于捕获 BioNER 中的句子间信息。具体来说,缓存存储由预定义的缓存规则约束的最近隐藏表示。模型使用查询和读取机制从缓存中检索类似的历史记录作为局部上下文。然后,采用基于注意力的门控网络,使用 BioBERT 生成与上下文相关的特征。为了动态更新缓存,我们设计了一个评分函数,并实现了一种多任务方法来联合训练我们的模型。我们在四个生物医学数据集上构建了一个综合基准来公平地评估模型性能。最后,大量实验清楚地验证了我们提出的 BioNER-Cache 与各种基于句子内和句子间的最先进基线相比的优越性。

可用性和实现

代码将在 https://github.com/zgzjdx/BioNER-Cache 上提供。

补充信息

补充数据可在生物信息学在线获得。

相似文献

1
Improving biomedical named entity recognition by dynamic caching inter-sentence information.通过动态缓存句间信息来改进生物医学命名实体识别。
Bioinformatics. 2022 Aug 10;38(16):3976-3983. doi: 10.1093/bioinformatics/btac422.
2
BioByGANS: biomedical named entity recognition by fusing contextual and syntactic features through graph attention network in node classification framework.BioByGANS:通过图注意力网络在节点分类框架中融合上下文和句法特征进行生物医学命名实体识别。
BMC Bioinformatics. 2022 Nov 22;23(1):501. doi: 10.1186/s12859-022-05051-9.
3
Biomedical named entity recognition with the combined feature attention and fully-shared multi-task learning.基于联合特征注意力和全共享多任务学习的生物医学命名实体识别。
BMC Bioinformatics. 2022 Nov 3;23(1):458. doi: 10.1186/s12859-022-04994-3.
4
Augmenting biomedical named entity recognition with general-domain resources.利用通用领域资源增强生物医学命名实体识别。
J Biomed Inform. 2024 Nov;159:104731. doi: 10.1016/j.jbi.2024.104731. Epub 2024 Oct 4.
5
DTranNER: biomedical named entity recognition with deep learning-based label-label transition model.DTranNER:基于深度学习的标签-标签转换模型的生物医学命名实体识别。
BMC Bioinformatics. 2020 Feb 11;21(1):53. doi: 10.1186/s12859-020-3393-1.
6
AIONER: all-in-one scheme-based biomedical named entity recognition using deep learning.AIONER:基于整体方案的深度学习生物医学命名实体识别。
Bioinformatics. 2023 May 4;39(5). doi: 10.1093/bioinformatics/btad310.
7
Cross-type biomedical named entity recognition with deep multi-task learning.基于深度多任务学习的跨类型生物医学命名实体识别。
Bioinformatics. 2019 May 15;35(10):1745-1752. doi: 10.1093/bioinformatics/bty869.
8
Transferring From Textual Entailment to Biomedical Named Entity Recognition.从文本蕴含到生物医学命名实体识别的转换
IEEE/ACM Trans Comput Biol Bioinform. 2023 Jul-Aug;20(4):2577-2586. doi: 10.1109/TCBB.2023.3236477. Epub 2023 Aug 9.
9
Multitask learning for biomedical named entity recognition with cross-sharing structure.基于交叉共享结构的生物医学命名实体识别的多任务学习。
BMC Bioinformatics. 2019 Aug 16;20(1):427. doi: 10.1186/s12859-019-3000-5.
10
Enhancing biomedical named entity recognition with parallel boundary detection and category classification.通过并行边界检测和类别分类增强生物医学命名实体识别
BMC Bioinformatics. 2025 Feb 25;26(1):63. doi: 10.1186/s12859-025-06086-4.

引用本文的文献

1
EnzChemRED, a rich enzyme chemistry relation extraction dataset.EnzChemRED,一个富含酶化学关系提取的数据集。
Sci Data. 2024 Sep 9;11(1):982. doi: 10.1038/s41597-024-03835-7.
2
Application of machine reading comprehension techniques for named entity recognition in materials science.机器阅读理解技术在材料科学中用于命名实体识别的应用
J Cheminform. 2024 Jul 2;16(1):76. doi: 10.1186/s13321-024-00874-5.
3
AIONER: all-in-one scheme-based biomedical named entity recognition using deep learning.AIONER:基于整体方案的深度学习生物医学命名实体识别。
Bioinformatics. 2023 May 4;39(5). doi: 10.1093/bioinformatics/btad310.