Suppr超能文献

基于具有句法依存特征的多头注意力模型的生物医学文本关系抽取:建模研究

Relation Extraction in Biomedical Texts Based on Multi-Head Attention Model With Syntactic Dependency Feature: Modeling Study.

作者信息

Li Yongbin, Hui Linhu, Zou Liping, Li Huyang, Xu Luo, Wang Xiaohua, Chua Stephanie

机构信息

School of Medical Information Engineering, Zunyi Medical University, Zunyi, China.

Faculty of Computer Science and Information Technology, University Malaysia Sarawak, Sarawak, Malaysia.

出版信息

JMIR Med Inform. 2022 Oct 20;10(10):e41136. doi: 10.2196/41136.

Abstract

BACKGROUND

With the rapid expansion of biomedical literature, biomedical information extraction has attracted increasing attention from researchers. In particular, relation extraction between 2 entities is a long-term research topic.

OBJECTIVE

This study aimed to perform 2 multiclass relation extraction tasks of Biomedical Natural Language Processing Workshop 2019 Open Shared Tasks: relation extraction of Bacteria-Biotope (BB-rel) task and binary relation extraction of plant seed development (SeeDev-binary) task. In essence, these 2 tasks are aimed at extracting the relation between annotated entity pairs from biomedical texts, which is a challenging problem.

METHODS

Traditional research methods adopted feature- or kernel-based methods and achieved good performance. For these tasks, we propose a deep learning model based on a combination of several distributed features, such as domain-specific word embedding, part-of-speech embedding, entity-type embedding, distance embedding, and position embedding. The multi-head attention mechanism is used to extract the global semantic features of an entire sentence. Meanwhile, we introduced a dependency-type feature and the shortest dependency path connecting 2 candidate entities in the syntactic dependency graph to enrich the feature representation.

RESULTS

Experiments show that our proposed model has excellent performance in biomedical relation extraction, achieving F scores of 65.56% and 38.04% on the test sets of the BB-rel and SeeDev-binary tasks. Especially in the SeeDev-binary task, the F score of our model is superior to that of other existing models and achieves state-of-the-art performance.

CONCLUSIONS

We demonstrated that the multi-head attention mechanism can learn relevant syntactic and semantic features in different representation subspaces and different positions to extract comprehensive feature representation. Moreover, syntactic dependency features can improve the performance of the model by learning dependency relation between the entities in biomedical texts.

摘要

背景

随着生物医学文献的迅速增长,生物医学信息提取已引起研究人员越来越多的关注。特别是,两个实体之间的关系提取是一个长期的研究课题。

目的

本研究旨在执行2019年生物医学自然语言处理研讨会开放共享任务中的两个多类关系提取任务:细菌-生物群落关系提取(BB-rel)任务和植物种子发育二元关系提取(SeeDev-binary)任务。从本质上讲,这两个任务旨在从生物医学文本中提取注释实体对之间的关系,这是一个具有挑战性的问题。

方法

传统研究方法采用基于特征或核的方法,并取得了良好的性能。对于这些任务,我们提出了一种基于多种分布式特征组合的深度学习模型,如特定领域词嵌入、词性嵌入、实体类型嵌入、距离嵌入和位置嵌入。多头注意力机制用于提取整个句子的全局语义特征。同时,我们引入了依存类型特征和句法依存图中连接两个候选实体的最短依存路径,以丰富特征表示。

结果

实验表明,我们提出的模型在生物医学关系提取方面具有优异的性能,在BB-rel和SeeDev-binary任务的测试集上分别达到了65.56%和38.04%的F分数。特别是在SeeDev-binary任务中,我们模型的F分数优于其他现有模型,并达到了当前的最优性能。

结论

我们证明了多头注意力机制可以在不同的表示子空间和不同位置学习相关的句法和语义特征,以提取综合特征表示。此外,句法依存特征可以通过学习生物医学文本中实体之间的依存关系来提高模型的性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a32f/9634522/93657783d200/medinform_v10i10e41136_fig1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验