Suppr超能文献

使用图卷积网络和多头注意力的文档级生物医学关系抽取:算法开发与验证

Document-Level Biomedical Relation Extraction Using Graph Convolutional Network and Multihead Attention: Algorithm Development and Validation.

作者信息

Wang Jian, Chen Xiaoyu, Zhang Yu, Zhang Yijia, Wen Jiabin, Lin Hongfei, Yang Zhihao, Wang Xin

机构信息

School of Computer Science and Technology, Dalian University of Technology, Dalian, China.

Department of VIP, The Second Hospital of Dalian Medical University, Dalian, China.

出版信息

JMIR Med Inform. 2020 Jul 31;8(7):e17638. doi: 10.2196/17638.

Abstract

BACKGROUND

Automatically extracting relations between chemicals and diseases plays an important role in biomedical text mining. Chemical-disease relation (CDR) extraction aims at extracting complex semantic relationships between entities in documents, which contain intrasentence and intersentence relations. Most previous methods did not consider dependency syntactic information across the sentences, which are very valuable for the relations extraction task, in particular, for extracting the intersentence relations accurately.

OBJECTIVE

In this paper, we propose a novel end-to-end neural network based on the graph convolutional network (GCN) and multihead attention, which makes use of the dependency syntactic information across the sentences to improve CDR extraction task.

METHODS

To improve the performance of intersentence relation extraction, we constructed a document-level dependency graph to capture the dependency syntactic information across sentences. GCN is applied to capture the feature representation of the document-level dependency graph. The multihead attention mechanism is employed to learn the relatively important context features from different semantic subspaces. To enhance the input representation, the deep context representation is used in our model instead of traditional word embedding.

RESULTS

We evaluate our method on CDR corpus. The experimental results show that our method achieves an F-measure of 63.5%, which is superior to other state-of-the-art methods. In the intrasentence level, our method achieves a precision, recall, and F-measure of 59.1%, 81.5%, and 68.5%, respectively. In the intersentence level, our method achieves a precision, recall, and F-measure of 47.8%, 52.2%, and 49.9%, respectively.

CONCLUSIONS

The GCN model can effectively exploit the across sentence dependency information to improve the performance of intersentence CDR extraction. Both the deep context representation and multihead attention are helpful in the CDR extraction task.

摘要

背景

自动提取化学物质与疾病之间的关系在生物医学文本挖掘中起着重要作用。化学物质-疾病关系(CDR)提取旨在提取文档中实体之间的复杂语义关系,其中包含句内关系和句间关系。大多数先前的方法没有考虑跨句子的依存句法信息,而这些信息对于关系提取任务非常有价值,特别是对于准确提取句间关系。

目的

在本文中,我们提出了一种基于图卷积网络(GCN)和多头注意力的新型端到端神经网络,该网络利用跨句子的依存句法信息来改进CDR提取任务。

方法

为了提高句间关系提取的性能,我们构建了一个文档级依存图来捕获跨句子的依存句法信息。应用GCN来捕获文档级依存图的特征表示。采用多头注意力机制从不同的语义子空间中学习相对重要的上下文特征。为了增强输入表示,我们的模型中使用了深度上下文表示而不是传统的词嵌入。

结果

我们在CDR语料库上评估了我们的方法。实验结果表明,我们的方法F值达到了63.5%,优于其他现有方法。在句内层面,我们的方法精确率、召回率和F值分别达到了59.1%、81.5%和68.5%。在句间层面,我们的方法精确率、召回率和F值分别达到了47.8%、52.2%和49.9%。

结论

GCN模型可以有效地利用跨句子的依存信息来提高句间CDR提取的性能。深度上下文表示和多头注意力在CDR提取任务中都很有帮助。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2a25/7458061/6a6419215eb9/medinform_v8i7e17638_fig1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验