Suppr超能文献

用于生物医学事件触发检测的基于有效类型标签的协同表示学习

Effective type label-based synergistic representation learning for biomedical event trigger detection.

作者信息

Hao Anran, Yuan Haohan, Hui Siu Cheung, Su Jian

机构信息

School of Computer Science and Engineering, Nanyang Technological University, 50 Nanyang Avenue, Singapore, Singapore.

Aural & Language Intelligence, Institute for Infocomm Research, Agency for Science, Technology and Research, 1 Fusionopolis Way, Singapore, Singapore.

出版信息

BMC Bioinformatics. 2024 Jul 31;25(1):251. doi: 10.1186/s12859-024-05851-1.

Abstract

BACKGROUND

Detecting event triggers in biomedical texts, which contain domain knowledge and context-dependent terms, is more challenging than in general-domain texts. Most state-of-the-art models rely mainly on external resources such as linguistic tools and knowledge bases to improve system performance. However, they lack effective mechanisms to obtain semantic clues from label specification and sentence context. Given its success in image classification, label representation learning is a promising approach to enhancing biomedical event trigger detection models by leveraging the rich semantics of pre-defined event type labels.

RESULTS

In this paper, we propose the Biomedical Label-based Synergistic representation Learning (BioLSL) model, which effectively utilizes event type labels by learning their correlation with trigger words and enriches the representation contextually. The BioLSL model consists of three modules. Firstly, the Domain-specific Joint Encoding module employs a transformer-based, domain-specific pre-trained architecture to jointly encode input sentences and pre-defined event type labels. Secondly, the Label-based Synergistic Representation Learning module learns the semantic relationships between input texts and event type labels, and generates a Label-Trigger Aware Representation (LTAR) and a Label-Context Aware Representation (LCAR) for enhanced semantic representations. Finally, the Trigger Classification module makes structured predictions, where each label is predicted with respect to its neighbours. We conduct experiments on three benchmark BioNLP datasets, namely MLEE, GE09, and GE11, to evaluate our proposed BioLSL model. Results show that BioLSL has achieved state-of-the-art performance, outperforming the baseline models.

CONCLUSIONS

The proposed BioLSL model demonstrates good performance for biomedical event trigger detection without using any external resources. This suggests that label representation learning and context-aware enhancement are promising directions for improving the task. The key enhancement is that BioLSL effectively learns to construct semantic linkages between the event mentions and type labels, which provide the latent information of label-trigger and label-context relationships in biomedical texts. Moreover, additional experiments on BioLSL show that it performs exceptionally well with limited training data under the data-scarce scenarios.

摘要

背景

在包含领域知识和上下文相关术语的生物医学文本中检测事件触发因素,比在通用领域文本中更具挑战性。大多数最先进的模型主要依赖语言工具和知识库等外部资源来提高系统性能。然而,它们缺乏从标签规范和句子上下文中获取语义线索的有效机制。鉴于标签表示学习在图像分类中的成功,它是一种通过利用预定义事件类型标签的丰富语义来增强生物医学事件触发检测模型的有前途的方法。

结果

在本文中,我们提出了基于生物医学标签的协同表示学习(BioLSL)模型,该模型通过学习事件类型标签与触发词的相关性来有效利用它们,并在上下文上丰富表示。BioLSL模型由三个模块组成。首先,特定领域联合编码模块采用基于Transformer的特定领域预训练架构来联合编码输入句子和预定义事件类型标签。其次,基于标签的协同表示学习模块学习输入文本与事件类型标签之间的语义关系,并生成标签触发感知表示(LTAR)和标签上下文感知表示(LCAR)以增强语义表示。最后,触发分类模块进行结构化预测,其中每个标签相对于其邻居进行预测。我们在三个基准生物自然语言处理数据集,即MLEE、GE09和GE11上进行实验,以评估我们提出的BioLSL模型。结果表明,BioLSL取得了最先进的性能,优于基线模型。

结论

所提出的BioLSL模型在不使用任何外部资源的情况下,在生物医学事件触发检测方面表现出良好的性能。这表明标签表示学习和上下文感知增强是改进该任务的有前途的方向。关键增强点在于BioLSL有效地学习构建事件提及与类型标签之间的语义联系,这提供了生物医学文本中标签-触发和标签-上下文关系的潜在信息。此外,对BioLSL的额外实验表明,在数据稀缺的情况下,它在有限的训练数据下表现异常出色。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a025/11293144/2ca811c4ad0d/12859_2024_5851_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验