School of Information Science and Technology, Dalian Maritime University, Dalian 116026, China.
School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China.
Bioinformatics. 2024 Jul 1;40(7). doi: 10.1093/bioinformatics/btae418.
Biomedical relation extraction at the document level (Bio-DocRE) involves extracting relation instances from biomedical texts that span multiple sentences, often containing various entity concepts such as genes, diseases, chemicals, variants, etc. Currently, this task is usually implemented based on graphs or transformers. However, most work directly models entity features to relation prediction, ignoring the effectiveness of entity pair information as an intermediate state for relation prediction. In this article, we decouple this task into a three-stage process to capture sufficient information for improving relation prediction.
We propose an innovative framework HTGRS for Bio-DocRE, which constructs a hierarchical tree graph (HTG) to integrate key information sources in the document, achieving relation reasoning based on entity. In addition, inspired by the idea of semantic segmentation, we conceptualize the task as a table-filling problem and develop a relation segmentation (RS) module to enhance relation reasoning based on the entity pair. Extensive experiments on three datasets show that the proposed framework outperforms the state-of-the-art methods and achieves superior performance.
Our source code is available at https://github.com/passengeryjy/HTGRS.
生物医学文档级关系抽取(Bio-DocRE)涉及从跨多个句子的生物医学文本中提取关系实例,这些文本通常包含各种实体概念,如基因、疾病、化学物质、变体等。目前,这项任务通常基于图或转换器来实现。然而,大多数工作直接对实体特征进行建模以进行关系预测,而忽略了实体对信息作为关系预测中间状态的有效性。在本文中,我们将该任务解耦为三个阶段的过程,以捕获足够的信息来提高关系预测的准确性。
我们提出了一种用于 Bio-DocRE 的创新框架 HTGRS,该框架构建了一个层次树图(HTG)来整合文档中的关键信息源,实现基于实体的关系推理。此外,受语义分割思想的启发,我们将任务概念化为表格填充问题,并开发了一个关系分割(RS)模块,以基于实体对增强关系推理。在三个数据集上的广泛实验表明,所提出的框架优于最先进的方法,并取得了卓越的性能。
我们的源代码可在 https://github.com/passengeryjy/HTGRS 上获得。