Suppr超能文献

将钾离子实体整合到生物医学文本的指代消解中。

Integrating K+ Entities Into Coreference Resolution on Biomedical Texts.

作者信息

Li Yufei, Ma Xiaoyong, Zhou Xiangyu, Cheng Penghzhen, He Kai, Gong Tieliang, Li Chen

出版信息

IEEE/ACM Trans Comput Biol Bioinform. 2024 Nov-Dec;21(6):2145-2155. doi: 10.1109/TCBB.2024.3447273. Epub 2024 Dec 10.

Abstract

Biomedical Coreference Resolution focuses on identifying the coreferences in biomedical texts, which normally consists of two parts: (i) mention detection to identify textual representation of biological entities and (ii) finding their coreference links. Recently, a popular approach to enhance the task is to embed knowledge base into deep neural networks. However, the way in which these methods integrate knowledge leads to the shortcoming that such knowledge may play a larger role in mention detection than coreference resolution. Specifically, they tend to integrate knowledge prior to mention detection, as part of the embeddings. Besides, they primarily focus on mention-dependent knowledge (KBase), i.e., knowledge entities directly related to mentions, while ignores the correlated knowledge (K+) between mentions in the mention-pair. For mentions with significant differences in word form, this may limit their ability to extract potential correlations between those mentions. Thus, this paper develops a novel model to integrate both KBase and K+ entities and achieves the state-of-the-art performance on BioNLP and CRAFT-CR datasets. Empirical studies on mention detection with different length reveals the effectiveness of the KBase entities. The evaluation on cross-sentence and match/mismatch coreference further demonstrate the superiority of the K+ entities in extracting background potential correlation between mentions.

摘要

生物医学共指消解专注于识别生物医学文本中的共指关系,它通常由两部分组成:(i)提及检测,用于识别生物实体的文本表示;(ii)找到它们的共指链接。最近,一种增强该任务的流行方法是将知识库嵌入深度神经网络。然而,这些方法整合知识的方式导致了一个缺点,即此类知识在提及检测中可能比在共指消解中发挥更大的作用。具体而言,它们倾向于在提及检测之前将知识作为嵌入的一部分进行整合。此外,它们主要关注与提及相关的知识(KBase),即与提及直接相关的知识实体,而忽略了提及对中提及之间的相关知识(K+)。对于词形有显著差异的提及,这可能会限制它们提取这些提及之间潜在相关性的能力。因此,本文开发了一种新颖的模型来整合KBase和K+实体,并在BioNLP和CRAFT-CR数据集上取得了最优性能。对不同长度提及检测的实证研究揭示了KBase实体的有效性。对跨句子和匹配/不匹配共指的评估进一步证明了K+实体在提取提及之间背景潜在相关性方面的优越性。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验