School of Computer Science and Technology, Dalian University of Technology, Dalian, 116024, China.
School of Computer Science and Technology, Dalian University of Technology, Dalian, 116024, China.
Methods. 2024 Nov;231:8-14. doi: 10.1016/j.ymeth.2024.08.007. Epub 2024 Sep 4.
Biomedical event causal relation extraction (BECRE), as a subtask of biomedical information extraction, aims to extract event causal relation facts from unstructured biomedical texts and plays an essential role in many downstream tasks. The existing works have two main problems: i) Only shallow features are limited in helping the model establish potential relationships between biomedical events. ii) Using the traditional oversampling method to solve the data imbalance problem of the BECRE tasks ignores the requirements for data diversifying. This paper proposes a novel biomedical event causal relation extraction method to solve the above problems using deep knowledge fusion and Roberta-based data augmentation. To address the first problem, we fuse deep knowledge, including structural event representation and entity relation path, for establishing potential semantic connections between biomedical events. We use the Graph Convolutional Neural network (GCN) and the predicated tensor model to acquire structural event representation, and entity relation paths are encoded based on the external knowledge bases (GTD, CDR, CHR, GDA and UMLS). We introduce the triplet attention mechanism to fuse structural event representation and entity relation path information. Besides, this paper proposes the Roberta-based data augmentation method to address the second problem, some words of biomedical text, except biomedical events, are masked proportionally and randomly, and then pre-trained Roberta generates data instances for the imbalance BECRE dataset. Extensive experimental results on Hahn-Powell's and BioCause datasets confirm that the proposed method achieves state-of-the-art performance compared to current advances.
生物医学事件因果关系抽取(BECRE)作为生物医学信息抽取的一个子任务,旨在从非结构化的生物医学文本中抽取事件因果关系事实,在许多下游任务中起着至关重要的作用。现有的工作主要存在两个问题:i)仅浅层特征限制了模型建立生物医学事件之间潜在关系的能力。ii)使用传统的过采样方法来解决 BECRE 任务的数据不平衡问题,忽略了对数据多样化的要求。本文提出了一种新的生物医学事件因果关系抽取方法,通过深度知识融合和基于 Roberta 的数据增强来解决上述问题。为了解决第一个问题,我们融合了深度知识,包括结构事件表示和实体关系路径,以建立生物医学事件之间潜在的语义联系。我们使用图卷积神经网络(GCN)和预测张量模型获取结构事件表示,并且基于外部知识库(GTD、CDR、CHR、GDA 和 UMLS)对实体关系路径进行编码。我们引入了三元组注意力机制来融合结构事件表示和实体关系路径信息。此外,本文提出了基于 Roberta 的数据增强方法来解决第二个问题,即生物医学文本中除生物医学事件外的一些词语按比例和随机屏蔽,然后预训练的 Roberta 为不平衡的 BECRE 数据集生成数据实例。在 Hahn-Powell 和 BioCause 数据集上的广泛实验结果证实,与现有进展相比,所提出的方法具有最先进的性能。