Suppr超能文献

通过以数据为中心和预处理稳健的集成学习方法增强生物医学关系提取。

Enhancing biomedical relation extraction through data-centric and preprocessing-robust ensemble learning approach.

作者信息

Meesawad Wilailack, Han Jen-Chieh, Hsueh Chun-Yu, Zhang Yu, Hung Hsi-Chuan, Tsai Richard Tzong-Han

机构信息

Department of Computer Science and Information Engineering, National Central University, No. 300, Zhongda Rd., Zhongli District, Taoyuan 320, Taiwan.

Department of Medical Research, Cathay General Hospital, No. 280, Sec. 4, Ren'ai Rd., Da'an Dist., Taipei 106, Taiwan.

出版信息

Database (Oxford). 2025 May 22;2025. doi: 10.1093/database/baae127.

Abstract

The paper describes our biomedical relation extraction system, which is designed to participate in the BioCreative VIII challenge Track 1: BioRED Track, which emphasizes the relation extraction from biomedical literature. Our system employs an ensemble learning method, leveraging the PubTator API in conjunction with multiple pretrained bidirectional encoder representations from transformer (BERT) models. Various preprocessing inputs are incorporated, encompassing prompt questions, entity ID pairs, and co-occurrence contexts. To enhance model comprehension, special tokens and boundary tags are incorporated. Specifically, we utilize PubMedBERT alongside the Max Rule ensemble learning mechanism to amalgamate outputs from diverse classifiers. Our findings surpass the established benchmark score, thereby providing a robust benchmark for evaluating performance in this task. Moreover, our study introduces and demonstrates the effectiveness of a data-centric approach, emphasizing the significance of prioritizing high-quality data instances in enhancing model performance and robustness.

摘要

本文描述了我们的生物医学关系提取系统,该系统旨在参与生物创意 VIII 挑战赛的任务 1:生物关系提取任务(BioRED 任务),该任务强调从生物医学文献中提取关系。我们的系统采用集成学习方法,结合 PubTator API 和多个预训练的基于变换器的双向编码器表征(BERT)模型。纳入了各种预处理输入,包括提示问题、实体 ID 对和共现上下文。为了增强模型理解,还纳入了特殊令牌和边界标签。具体而言,我们将 PubMedBERT 与最大规则集成学习机制相结合,以合并来自不同分类器的输出。我们的研究结果超过了既定的基准分数,从而为评估该任务的性能提供了一个强有力的基准。此外,我们的研究介绍并展示了以数据为中心的方法的有效性,强调了在提高模型性能和稳健性方面优先考虑高质量数据实例的重要性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e1ab/12097206/32a659b3e6a1/baae127f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验