基于集成学习和注意力机制的生物医学关系抽取方法。

Biomedical relation extraction method based on ensemble learning and attention mechanism.

机构信息

Department of Radiation Oncology, Shanghai East Hospital, Tongji University School of Medicine, Shanghai, China.

Department of Computer College, Beijing Information Science and Technology University, Beijing, China.

出版信息

BMC Bioinformatics. 2024 Oct 18;25(1):333. doi: 10.1186/s12859-024-05951-y.

DOI:10.1186/s12859-024-05951-y

PMID:39425010

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11488084/

Abstract

BACKGROUND

Relation extraction (RE) plays a crucial role in biomedical research as it is essential for uncovering complex semantic relationships between entities in textual data. Given the significance of RE in biomedical informatics and the increasing volume of literature, there is an urgent need for advanced computational models capable of accurately and efficiently extracting these relationships on a large scale.

RESULTS

This paper proposes a novel approach, SARE, combining ensemble learning Stacking and attention mechanisms to enhance the performance of biomedical relation extraction. By leveraging multiple pre-trained models, SARE demonstrates improved adaptability and robustness across diverse domains. The attention mechanisms enable the model to capture and utilize key information in the text more accurately. SARE achieved performance improvements of 4.8, 8.7, and 0.8 percentage points on the PPI, DDI, and ChemProt datasets, respectively, compared to the original BERT variant and the domain-specific PubMedBERT model.

CONCLUSIONS

SARE offers a promising solution for improving the accuracy and efficiency of relation extraction tasks in biomedical research, facilitating advancements in biomedical informatics. The results suggest that combining ensemble learning with attention mechanisms is effective for extracting complex relationships from biomedical texts. Our code and data are publicly available at: https://github.com/GS233/Biomedical .

摘要

背景

关系抽取（RE）在生物医学研究中起着至关重要的作用，因为它对于揭示文本数据中实体之间复杂的语义关系至关重要。鉴于 RE 在生物医学信息学中的重要性以及文献数量的不断增加，迫切需要先进的计算模型，能够大规模准确且高效地提取这些关系。

结果

本文提出了一种新颖的方法 SARE，结合集成学习 Stacking 和注意力机制来提高生物医学关系抽取的性能。通过利用多个预训练模型，SARE 展示了在不同领域的更好的适应性和鲁棒性。注意力机制使模型能够更准确地捕获和利用文本中的关键信息。与原始的 BERT 变体和特定于领域的 PubMedBERT 模型相比，SARE 在 PPI、DDI 和 ChemProt 数据集上分别实现了 4.8、8.7 和 0.8 个百分点的性能提升。