Suppr超能文献

基于主题捕获和局部实体池化的生物医学文档级关系抽取

Biomedical document-level relation extraction with thematic capture and localized entity pooling.

作者信息

Li Yuqing, Shao Xinhui

机构信息

Department of Mathematics, College of Sciences, Northeastern University, Shenyang, China.

Department of Mathematics, College of Sciences, Northeastern University, Shenyang, China.

出版信息

J Biomed Inform. 2024 Dec;160:104756. doi: 10.1016/j.jbi.2024.104756. Epub 2024 Nov 30.

Abstract

In contrast to sentence-level relational extraction, document-level relation extraction poses greater challenges as a document typically contains multiple entities, and one entity may be associated with multiple other entities. Existing methods often rely on graph structures to capture path representations between entity pairs. However, this paper introduces a novel approach called local entity pooling that solely relies on the pre-training model to identify the bridge entity related to the current entity pair and generate the reasoning path representation. This technique effectively mitigates the multi-entity problem. Additionally, the model leverages the multi-entity and multi-label characteristics of the document to acquire the document's thematic representation, thereby enhancing the document-level relation extraction task. Experimental evaluations conducted on two biomedical datasets, CDR and GDA. Our TCLEP (Thematic Capture and Localized Entity Pooling) model achieved the Macro-F1 scores of 71.7% and 85.3%, respectively. Simultaneously, we incorporated local entity pooling and thematic capture modules into the state-of-the-art model, resulting in performance improvements of 1.5% and 0.2% on the respective datasets. These results highlight the advanced performance of our proposed approach.

摘要

与句子级关系抽取相比,文档级关系抽取带来了更大的挑战,因为文档通常包含多个实体,并且一个实体可能与多个其他实体相关联。现有方法通常依赖图结构来捕获实体对之间的路径表示。然而,本文介绍了一种名为局部实体池化的新颖方法,该方法仅依靠预训练模型来识别与当前实体对相关的桥梁实体并生成推理路径表示。这种技术有效地缓解了多实体问题。此外,该模型利用文档的多实体和多标签特征来获取文档的主题表示,从而增强文档级关系抽取任务。在两个生物医学数据集CDR和GDA上进行了实验评估。我们的TCLEP(主题捕获和局部实体池化)模型分别取得了71.7%和85.3%的宏F1分数。同时,我们将局部实体池化和主题捕获模块纳入到最先进的模型中,在各自的数据集上分别带来了1.5%和0.2%的性能提升。这些结果突出了我们所提出方法的先进性能。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验