• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

BioRED:一个丰富的生物医学关系抽取数据集。

BioRED: a rich biomedical relation extraction dataset.

机构信息

National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), Bethesda, MD 20894, USA.

University of Delaware, Newark, DE 19716, USA.

出版信息

Brief Bioinform. 2022 Sep 20;23(5). doi: 10.1093/bib/bbac282.

DOI:10.1093/bib/bbac282
PMID:35849818
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9487702/
Abstract

Automated relation extraction (RE) from biomedical literature is critical for many downstream text mining applications in both research and real-world settings. However, most existing benchmarking datasets for biomedical RE only focus on relations of a single type (e.g. protein-protein interactions) at the sentence level, greatly limiting the development of RE systems in biomedicine. In this work, we first review commonly used named entity recognition (NER) and RE datasets. Then, we present a first-of-its-kind biomedical relation extraction dataset (BioRED) with multiple entity types (e.g. gene/protein, disease, chemical) and relation pairs (e.g. gene-disease; chemical-chemical) at the document level, on a set of 600 PubMed abstracts. Furthermore, we label each relation as describing either a novel finding or previously known background knowledge, enabling automated algorithms to differentiate between novel and background information. We assess the utility of BioRED by benchmarking several existing state-of-the-art methods, including Bidirectional Encoder Representations from Transformers (BERT)-based models, on the NER and RE tasks. Our results show that while existing approaches can reach high performance on the NER task (F-score of 89.3%), there is much room for improvement for the RE task, especially when extracting novel relations (F-score of 47.7%). Our experiments also demonstrate that such a rich dataset can successfully facilitate the development of more accurate, efficient and robust RE systems for biomedicine. Availability: The BioRED dataset and annotation guidelines are freely available at https://ftp.ncbi.nlm.nih.gov/pub/lu/BioRED/.

摘要

从生物医学文献中自动提取关系(RE)对于研究和实际环境中的许多下游文本挖掘应用都至关重要。然而,大多数现有的生物医学 RE 基准数据集仅关注句子级别的单一类型的关系(例如蛋白质-蛋白质相互作用),极大地限制了生物医学中 RE 系统的发展。在这项工作中,我们首先回顾了常用的命名实体识别(NER)和 RE 数据集。然后,我们提出了一种首创的生物医学关系提取数据集(BioRED),该数据集具有多种实体类型(例如基因/蛋白质、疾病、化学物质)和关系对(例如基因-疾病;化学-化学),涵盖了 600 篇 PubMed 摘要。此外,我们将每个关系标记为描述新发现或先前已知的背景知识,使自动算法能够区分新信息和背景信息。我们通过在 NER 和 RE 任务上对几种现有的最先进方法(包括基于 Transformer 的双向编码器表示(BERT)的模型)进行基准测试,评估了 BioRED 的效用。我们的结果表明,虽然现有的方法在 NER 任务上可以达到很高的性能(F1 得分为 89.3%),但在 RE 任务上仍有很大的改进空间,特别是在提取新关系时(F1 得分为 47.7%)。我们的实验还表明,这样一个丰富的数据集可以成功地促进更准确、高效和鲁棒的生物医学 RE 系统的开发。

可用性

BioRED 数据集和注释指南可在 https://ftp.ncbi.nlm.nih.gov/pub/lu/BioRED/ 上免费获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a821/9487702/f5d9b4e39ad8/bbac282f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a821/9487702/2956c13fd1ce/bbac282f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a821/9487702/0130ceafad97/bbac282f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a821/9487702/f5d9b4e39ad8/bbac282f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a821/9487702/2956c13fd1ce/bbac282f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a821/9487702/0130ceafad97/bbac282f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a821/9487702/f5d9b4e39ad8/bbac282f3.jpg

相似文献

1
BioRED: a rich biomedical relation extraction dataset.BioRED:一个丰富的生物医学关系抽取数据集。
Brief Bioinform. 2022 Sep 20;23(5). doi: 10.1093/bib/bbac282.
2
The overview of the BioRED (Biomedical Relation Extraction Dataset) track at BioCreative VIII.生物创意 VIII 中生物医学关系提取数据集(BioRED)赛道概述。
Database (Oxford). 2024 Aug 8;2024. doi: 10.1093/database/baae069.
3
Integrating deep learning architectures for enhanced biomedical relation extraction: a pipeline approach.深度学习架构在增强生物医学关系抽取中的应用:一种流水线方法。
Database (Oxford). 2024 Aug 28;2024. doi: 10.1093/database/baae079.
4
The biomedical relationship corpus of the BioRED track at the BioCreative VIII challenge and workshop.生物创意 VIII 挑战赛和研讨会的 BioRED 专题生物医学关系语料库。
Database (Oxford). 2024 Aug 9;2024. doi: 10.1093/database/baae071.
5
BioREx: Improving biomedical relation extraction by leveraging heterogeneous datasets.BioREx:通过利用异构数据集改进生物医学关系提取
J Biomed Inform. 2023 Oct;146:104487. doi: 10.1016/j.jbi.2023.104487. Epub 2023 Sep 4.
6
BioREx: Improving Biomedical Relation Extraction by Leveraging Heterogeneous Datasets.BioREx:利用异构数据集改进生物医学关系抽取
ArXiv. 2023 Jun 19:arXiv:2306.11189v1.
7
Chemical identification and indexing in full-text articles: an overview of the NLM-Chem track at BioCreative VII.全文文章中的化学物质鉴定与标引:NLM-Chem 在 BioCreative VII 挑战赛中的概述
Database (Oxford). 2023 Mar 7;2023. doi: 10.1093/database/baad005.
8
Benchmarking for biomedical natural language processing tasks with a domain specific ALBERT.基于领域特定的 ALBERT 进行生物医学自然语言处理任务的基准测试。
BMC Bioinformatics. 2022 Apr 21;23(1):144. doi: 10.1186/s12859-022-04688-w.
9
Biomedical named entity recognition and linking datasets: survey and our recent development.生物医学命名实体识别与链接数据集:综述及我们的最新进展
Brief Bioinform. 2020 Dec 1;21(6):2219-2238. doi: 10.1093/bib/bbaa054.
10
A span-based joint model for extracting entities and relations of bacteria biotopes.基于跨度的细菌生境实体和关系抽取联合模型。
Bioinformatics. 2021 Dec 22;38(1):220-227. doi: 10.1093/bioinformatics/btab593.

引用本文的文献

1
Do LLMs Surpass Encoders for Biomedical NER?大型语言模型在生物医学命名实体识别方面是否超越了编码器?
Proc (IEEE Int Conf Healthc Inform). 2025 Jun;2025:352-358. doi: 10.1109/ICHI64645.2025.00048. Epub 2025 Jul 22.
2
Enhancing biomedical relation extraction with directionality.通过方向性增强生物医学关系提取
Bioinformatics. 2025 Jul 1;41(Supplement_1):i68-i76. doi: 10.1093/bioinformatics/btaf226.
3
CAS: enhancing implicit constrained data augmentation with semantic enrichment for biomedical relation extraction and beyond.
CAS:通过语义丰富增强隐式约束数据增强,用于生物医学关系提取及其他领域。
Database (Oxford). 2025 Jul 3;2025. doi: 10.1093/database/baaf025.
4
Artificial Intelligence-assisted Biomedical Literature Knowledge Synthesis to Support Decision-making in Precision Oncology.人工智能辅助生物医学文献知识综合以支持精准肿瘤学决策。
AMIA Annu Symp Proc. 2025 May 22;2024:513-522. eCollection 2024.
5
Enhancing biomedical relation extraction through data-centric and preprocessing-robust ensemble learning approach.通过以数据为中心和预处理稳健的集成学习方法增强生物医学关系提取。
Database (Oxford). 2025 May 22;2025. doi: 10.1093/database/baae127.
6
Prompting large language models to extract chemical‒disease relation precisely and comprehensively at the document level: an evaluation study.促使大语言模型在文档层面精确且全面地提取化学物质与疾病的关系:一项评估研究。
PLoS One. 2025 Apr 8;20(4):e0320123. doi: 10.1371/journal.pone.0320123. eCollection 2025.
7
DiMB-RE: mining the scientific literature for diet-microbiome associations.DiMB-RE:挖掘科学文献以寻找饮食与微生物组的关联。
J Am Med Inform Assoc. 2025 Jun 1;32(6):998-1006. doi: 10.1093/jamia/ocaf054.
8
Biomedical named entity recognition using improved green anaconda-assisted Bi-GRU-based hierarchical ResNet model.使用改进的绿色蟒蛇辅助的基于双向门控循环单元的分层残差神经网络模型进行生物医学命名实体识别。
BMC Bioinformatics. 2025 Jan 30;26(1):34. doi: 10.1186/s12859-024-06008-w.
9
BioGSF: a graph-driven semantic feature integration framework for biomedical relation extraction.BioGSF:一种用于生物医学关系提取的图驱动语义特征集成框架。
Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbaf025.
10
JTIS: enhancing biomedical document-level relation extraction through joint training with intermediate steps.JTIS:通过中间步骤的联合训练增强生物医学文档级关系抽取
Database (Oxford). 2024 Dec 19;2024. doi: 10.1093/database/baae125.