• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过以数据为中心和预处理稳健的集成学习方法增强生物医学关系提取。

Enhancing biomedical relation extraction through data-centric and preprocessing-robust ensemble learning approach.

作者信息

Meesawad Wilailack, Han Jen-Chieh, Hsueh Chun-Yu, Zhang Yu, Hung Hsi-Chuan, Tsai Richard Tzong-Han

机构信息

Department of Computer Science and Information Engineering, National Central University, No. 300, Zhongda Rd., Zhongli District, Taoyuan 320, Taiwan.

Department of Medical Research, Cathay General Hospital, No. 280, Sec. 4, Ren'ai Rd., Da'an Dist., Taipei 106, Taiwan.

出版信息

Database (Oxford). 2025 May 22;2025. doi: 10.1093/database/baae127.

DOI:10.1093/database/baae127
PMID:40402771
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12097206/
Abstract

The paper describes our biomedical relation extraction system, which is designed to participate in the BioCreative VIII challenge Track 1: BioRED Track, which emphasizes the relation extraction from biomedical literature. Our system employs an ensemble learning method, leveraging the PubTator API in conjunction with multiple pretrained bidirectional encoder representations from transformer (BERT) models. Various preprocessing inputs are incorporated, encompassing prompt questions, entity ID pairs, and co-occurrence contexts. To enhance model comprehension, special tokens and boundary tags are incorporated. Specifically, we utilize PubMedBERT alongside the Max Rule ensemble learning mechanism to amalgamate outputs from diverse classifiers. Our findings surpass the established benchmark score, thereby providing a robust benchmark for evaluating performance in this task. Moreover, our study introduces and demonstrates the effectiveness of a data-centric approach, emphasizing the significance of prioritizing high-quality data instances in enhancing model performance and robustness.

摘要

本文描述了我们的生物医学关系提取系统,该系统旨在参与生物创意 VIII 挑战赛的任务 1:生物关系提取任务(BioRED 任务),该任务强调从生物医学文献中提取关系。我们的系统采用集成学习方法,结合 PubTator API 和多个预训练的基于变换器的双向编码器表征(BERT)模型。纳入了各种预处理输入,包括提示问题、实体 ID 对和共现上下文。为了增强模型理解,还纳入了特殊令牌和边界标签。具体而言,我们将 PubMedBERT 与最大规则集成学习机制相结合,以合并来自不同分类器的输出。我们的研究结果超过了既定的基准分数,从而为评估该任务的性能提供了一个强有力的基准。此外,我们的研究介绍并展示了以数据为中心的方法的有效性,强调了在提高模型性能和稳健性方面优先考虑高质量数据实例的重要性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e1ab/12097206/d27cfaecec2a/baae127f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e1ab/12097206/32a659b3e6a1/baae127f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e1ab/12097206/d27cfaecec2a/baae127f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e1ab/12097206/32a659b3e6a1/baae127f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e1ab/12097206/d27cfaecec2a/baae127f2.jpg

相似文献

1
Enhancing biomedical relation extraction through data-centric and preprocessing-robust ensemble learning approach.通过以数据为中心和预处理稳健的集成学习方法增强生物医学关系提取。
Database (Oxford). 2025 May 22;2025. doi: 10.1093/database/baae127.
2
Integrating deep learning architectures for enhanced biomedical relation extraction: a pipeline approach.深度学习架构在增强生物医学关系抽取中的应用:一种流水线方法。
Database (Oxford). 2024 Aug 28;2024. doi: 10.1093/database/baae079.
3
Biomedical relation extraction method based on ensemble learning and attention mechanism.基于集成学习和注意力机制的生物医学关系抽取方法。
BMC Bioinformatics. 2024 Oct 18;25(1):333. doi: 10.1186/s12859-024-05951-y.
4
The overview of the BioRED (Biomedical Relation Extraction Dataset) track at BioCreative VIII.生物创意 VIII 中生物医学关系提取数据集(BioRED)赛道概述。
Database (Oxford). 2024 Aug 8;2024. doi: 10.1093/database/baae069.
5
Bioformer: an efficient transformer language model for biomedical text mining.生物former:一种用于生物医学文本挖掘的高效Transformer语言模型。
ArXiv. 2023 Feb 3:arXiv:2302.01588v1.
6
Do syntactic trees enhance Bidirectional Encoder Representations from Transformers (BERT) models for chemical-drug relation extraction?句法树是否能增强用于化学药物关系抽取的基于转换器的双向编码器表示(BERT)模型?
Database (Oxford). 2022 Aug 25;2022. doi: 10.1093/database/baac070.
7
Optimized biomedical entity relation extraction method with data augmentation and classification using GPT-4 and Gemini.基于 GPT-4 和 Gemini 的生物医学实体关系抽取数据增强与分类优化方法
Database (Oxford). 2024 Oct 9;2024. doi: 10.1093/database/baae104.
8
A BERT-based ensemble learning approach for the BioCreative VII challenges: full-text chemical identification and multi-label classification in PubMed articles.基于 BERT 的集成学习方法在 BioCreative VII 挑战赛中的应用:PubMed 文章中的全文化学物质识别和多标签分类。
Database (Oxford). 2022 Jul 15;2022. doi: 10.1093/database/baac056.
9
Identifying Disinformation on the Extended Impacts of COVID-19: Methodological Investigation Using a Fuzzy Ranking Ensemble of Natural Language Processing Models.识别关于新冠疫情长期影响的虚假信息:使用自然语言处理模型的模糊排序集成进行方法学调查
J Med Internet Res. 2025 May 21;27:e73601. doi: 10.2196/73601.
10
Ensemble pretrained language models to extract biomedical knowledge from literature.基于预训练语言模型的方法从文献中提取生物医学知识。
J Am Med Inform Assoc. 2024 Sep 1;31(9):1904-1911. doi: 10.1093/jamia/ocae061.

本文引用的文献

1
The overview of the BioRED (Biomedical Relation Extraction Dataset) track at BioCreative VIII.生物创意 VIII 中生物医学关系提取数据集(BioRED)赛道概述。
Database (Oxford). 2024 Aug 8;2024. doi: 10.1093/database/baae069.
2
BioREx: Improving biomedical relation extraction by leveraging heterogeneous datasets.BioREx:通过利用异构数据集改进生物医学关系提取
J Biomed Inform. 2023 Oct;146:104487. doi: 10.1016/j.jbi.2023.104487. Epub 2023 Sep 4.
3
BioRED: a rich biomedical relation extraction dataset.BioRED:一个丰富的生物医学关系抽取数据集。
Brief Bioinform. 2022 Sep 20;23(5). doi: 10.1093/bib/bbac282.
4
The DisGeNET knowledge platform for disease genomics: 2019 update.DisGeNET 疾病基因组学知识平台:2019 年更新。
Nucleic Acids Res. 2020 Jan 8;48(D1):D845-D855. doi: 10.1093/nar/gkz1021.
5
Assessing the state of the art in biomedical relation extraction: overview of the BioCreative V chemical-disease relation (CDR) task.评估生物医学关系抽取的技术现状:生物创意V化学-疾病关系(CDR)任务概述。
Database (Oxford). 2016 Mar 19;2016. doi: 10.1093/database/baw032. Print 2016.
6
The DDI corpus: an annotated corpus with pharmacological substances and drug-drug interactions.DDI 语料库:一个带有药理学物质和药物相互作用注释的语料库。
J Biomed Inform. 2013 Oct;46(5):914-20. doi: 10.1016/j.jbi.2013.07.011. Epub 2013 Jul 29.
7
PharmGKB: the Pharmacogenomics Knowledge Base.药物基因组学知识库(PharmGKB)
Methods Mol Biol. 2013;1015:311-20. doi: 10.1007/978-1-62703-435-7_20.
8
Toward an automatic method for extracting cancer- and other disease-related point mutations from the biomedical literature.从生物医学文献中自动提取癌症和其他疾病相关点突变的方法。
Bioinformatics. 2011 Feb 1;27(3):408-15. doi: 10.1093/bioinformatics/btq667. Epub 2010 Dec 7.
9
RelEx--relation extraction using dependency parse trees.RelEx——使用依存句法分析树进行关系抽取。
Bioinformatics. 2007 Feb 1;23(3):365-71. doi: 10.1093/bioinformatics/btl616. Epub 2006 Dec 1.
10
Comparative experiments on learning information extractors for proteins and their interactions.蛋白质及其相互作用的学习信息提取器的比较实验。
Artif Intell Med. 2005 Feb;33(2):139-55. doi: 10.1016/j.artmed.2004.07.016.