• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

BioREx:通过利用异构数据集改进生物医学关系提取

BioREx: Improving biomedical relation extraction by leveraging heterogeneous datasets.

作者信息

Lai Po-Ting, Wei Chih-Hsuan, Luo Ling, Chen Qingyu, Lu Zhiyong

机构信息

National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), MD, 20894 Bethesda, USA.

School of Computer Science and Technology, Dalian University of Technology, 116024 Dalian, China.

出版信息

J Biomed Inform. 2023 Oct;146:104487. doi: 10.1016/j.jbi.2023.104487. Epub 2023 Sep 4.

DOI:10.1016/j.jbi.2023.104487
PMID:37673376
Abstract

Biomedical relation extraction (RE) is the task of automatically identifying and characterizing relations between biomedical concepts from free text. RE is a central task in biomedical natural language processing (NLP) research and plays a critical role in many downstream applications, such as literature-based discovery and knowledge graph construction. State-of-the-art methods were used primarily to train machine learning models on individual RE datasets, such as protein-protein interaction and chemical-induced disease relation. Manual dataset annotation, however, is highly expensive and time-consuming, as it requires domain knowledge. Existing RE datasets are usually domain-specific or small, which limits the development of generalized and high-performing RE models. In this work, we present a novel framework for systematically addressing the data heterogeneity of individual datasets and combining them into a large dataset. Based on the framework and dataset, we report on BioREx, a data-centric approach for extracting relations. Our evaluation shows that BioREx achieves significantly higher performance than the benchmark system trained on the individual dataset, setting a new SOTA from 74.4% to 79.6% in F-1 measure on the recently released BioRED corpus. We further demonstrate that the combined dataset can improve performance for five different RE tasks. In addition, we show that on average BioREx compares favorably to current best-performing methods such as transfer learning and multi-task learning. Finally, we demonstrate BioREx's robustness and generalizability in two independent RE tasks not previously seen in training data: drug-drug N-ary combination and document-level gene-disease RE. The integrated dataset and optimized method have been packaged as a stand-alone tool available at https://github.com/ncbi/BioREx.

摘要

生物医学关系抽取(RE)是一项从自由文本中自动识别和刻画生物医学概念之间关系的任务。RE是生物医学自然语言处理(NLP)研究中的核心任务,在许多下游应用中发挥着关键作用,如基于文献的发现和知识图谱构建。最先进的方法主要用于在单个RE数据集上训练机器学习模型,如蛋白质-蛋白质相互作用和化学诱导疾病关系。然而,人工数据集标注成本高昂且耗时,因为它需要领域知识。现有的RE数据集通常是特定领域的或规模较小,这限制了通用且高性能的RE模型的发展。在这项工作中,我们提出了一个新颖的框架,用于系统地解决单个数据集的数据异质性问题,并将它们组合成一个大型数据集。基于该框架和数据集,我们报告了BioREx,一种以数据为中心的关系抽取方法。我们的评估表明,BioREx的性能显著高于在单个数据集上训练的基准系统,在最近发布的BioRED语料库上,F-1度量从74.4%提高到79.6%,创造了新的最优结果。我们进一步证明,组合后的数据集可以提高五种不同RE任务的性能。此外,我们表明,平均而言,BioREx与当前表现最佳的方法(如迁移学习和多任务学习)相比具有优势。最后,我们展示了BioREx在训练数据中未见过的两个独立RE任务中的鲁棒性和通用性:药物-药物N元组合和文档级基因-疾病RE。集成数据集和优化方法已打包为一个独立工具,可在https://github.com/ncbi/BioREx上获取。

相似文献

1
BioREx: Improving biomedical relation extraction by leveraging heterogeneous datasets.BioREx:通过利用异构数据集改进生物医学关系提取
J Biomed Inform. 2023 Oct;146:104487. doi: 10.1016/j.jbi.2023.104487. Epub 2023 Sep 4.
2
BioREx: Improving Biomedical Relation Extraction by Leveraging Heterogeneous Datasets.BioREx:利用异构数据集改进生物医学关系抽取
ArXiv. 2023 Jun 19:arXiv:2306.11189v1.
3
BioRED: a rich biomedical relation extraction dataset.BioRED:一个丰富的生物医学关系抽取数据集。
Brief Bioinform. 2022 Sep 20;23(5). doi: 10.1093/bib/bbac282.
4
Integrating deep learning architectures for enhanced biomedical relation extraction: a pipeline approach.深度学习架构在增强生物医学关系抽取中的应用:一种流水线方法。
Database (Oxford). 2024 Aug 28;2024. doi: 10.1093/database/baae079.
5
A comparison of word embeddings for the biomedical natural language processing.生物医学自然语言处理中词嵌入的比较。
J Biomed Inform. 2018 Nov;87:12-20. doi: 10.1016/j.jbi.2018.09.008. Epub 2018 Sep 12.
6
BertSRC: transformer-based semantic relation classification.BertSRC:基于转换器的语义关系分类。
BMC Med Inform Decis Mak. 2022 Sep 6;22(1):234. doi: 10.1186/s12911-022-01977-5.
7
Exploiting graph kernels for high performance biomedical relation extraction.利用图核进行高性能生物医学关系提取。
J Biomed Semantics. 2018 Jan 30;9(1):7. doi: 10.1186/s13326-017-0168-3.
8
A generalizable NLP framework for fast development of pattern-based biomedical relation extraction systems.一种可推广的基于 NLP 的生物医学关系抽取系统的模式快速开发框架。
BMC Bioinformatics. 2014 Aug 23;15(1):285. doi: 10.1186/1471-2105-15-285.
9
BERT-GT: cross-sentence n-ary relation extraction with BERT and Graph Transformer.BERT-GT:使用BERT和图变换器进行跨句子n元关系提取
Bioinformatics. 2021 Apr 5;36(24):5678-5685. doi: 10.1093/bioinformatics/btaa1087.
10
Drug knowledge discovery via multi-task learning and pre-trained models.通过多任务学习和预训练模型进行药物知识发现。
BMC Med Inform Decis Mak. 2021 Nov 16;21(Suppl 9):251. doi: 10.1186/s12911-021-01614-7.

引用本文的文献

1
Enhancing biomedical relation extraction with directionality.通过方向性增强生物医学关系提取
Bioinformatics. 2025 Jul 1;41(Supplement_1):i68-i76. doi: 10.1093/bioinformatics/btaf226.
2
Artificial Intelligence-assisted Biomedical Literature Knowledge Synthesis to Support Decision-making in Precision Oncology.人工智能辅助生物医学文献知识综合以支持精准肿瘤学决策。
AMIA Annu Symp Proc. 2025 May 22;2024:513-522. eCollection 2024.
3
Enhancing biomedical relation extraction through data-centric and preprocessing-robust ensemble learning approach.
通过以数据为中心和预处理稳健的集成学习方法增强生物医学关系提取。
Database (Oxford). 2025 May 22;2025. doi: 10.1093/database/baae127.
4
DiMB-RE: mining the scientific literature for diet-microbiome associations.DiMB-RE:挖掘科学文献以寻找饮食与微生物组的关联。
J Am Med Inform Assoc. 2025 Jun 1;32(6):998-1006. doi: 10.1093/jamia/ocaf054.
5
A large language model framework for literature-based disease-gene association prediction.一种基于文献的疾病-基因关联预测的大语言模型框架。
Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbaf070.
6
Optimized biomedical entity relation extraction method with data augmentation and classification using GPT-4 and Gemini.基于 GPT-4 和 Gemini 的生物医学实体关系抽取数据增强与分类优化方法
Database (Oxford). 2024 Oct 9;2024. doi: 10.1093/database/baae104.
7
EnzChemRED, a rich enzyme chemistry relation extraction dataset.EnzChemRED,一个富含酶化学关系提取的数据集。
Sci Data. 2024 Sep 9;11(1):982. doi: 10.1038/s41597-024-03835-7.
8
Integrating deep learning architectures for enhanced biomedical relation extraction: a pipeline approach.深度学习架构在增强生物医学关系抽取中的应用:一种流水线方法。
Database (Oxford). 2024 Aug 28;2024. doi: 10.1093/database/baae079.
9
Functional implications of glycans and their curation: insights from the workshop held at the 16th Annual International Biocuration Conference in Padua, Italy.聚糖的功能意义及其整理:在意大利帕多瓦举行的第 16 届国际生物整理会议研讨会上获得的认识。
Database (Oxford). 2024 Aug 13;2024. doi: 10.1093/database/baae073.
10
The biomedical relationship corpus of the BioRED track at the BioCreative VIII challenge and workshop.生物创意 VIII 挑战赛和研讨会的 BioRED 专题生物医学关系语料库。
Database (Oxford). 2024 Aug 9;2024. doi: 10.1093/database/baae071.