基于递归神经网络的化学-基因关系抽取。
Chemical-gene relation extraction using recursive neural network.
机构信息
Department of Computer Science and Engineering, Korea University, Anam-dong 5-ga, Seongbuk-gu, Seoul, South Korea.
出版信息
Database (Oxford). 2018 Jan 1;2018. doi: 10.1093/database/bay060.
In this article, we describe our system for the CHEMPROT task of the BioCreative VI challenge. Although considerable research on the named entity recognition of genes and drugs has been conducted, there is limited research on extracting relationships between them. Extracting relations between chemical compounds and genes from the literature is an important element in pharmacological and clinical research. The CHEMPROT task of BioCreative VI aims to promote the development of text mining systems that can be used to automatically extract relationships between chemical compounds and genes. We tested three recursive neural network approaches to improve the performance of relation extraction. In the BioCreative VI challenge, we developed a tree-Long Short-Term Memory networks (tree-LSTM) model with several additional features including a position feature and a subtree containment feature, and we also applied an ensemble method. After the challenge, we applied additional pre-processing steps to the tree-LSTM model, and we tested the performance of another recursive neural network model called Stack-augmented Parser Interpreter Neural Network (SPINN). Our tree-LSTM model achieved an F-score of 58.53% in the BioCreative VI challenge. Our tree-LSTM model with additional pre-processing and the SPINN model obtained F-scores of 63.7 and 64.1%, respectively.Database URL: https://github.com/arwhirang/recursive_chemprot.
在这篇文章中,我们描述了我们参加 BioCreative VI 挑战赛 CHEMPROT 任务的系统。虽然已经对基因和药物的命名实体识别进行了大量研究,但对它们之间关系的提取研究有限。从文献中提取化合物和基因之间的关系是药理学和临床研究的重要元素。BioCreative VI 的 CHEMPROT 任务旨在促进开发可用于自动提取化合物和基因之间关系的文本挖掘系统。我们测试了三种递归神经网络方法来提高关系提取的性能。在 BioCreative VI 挑战赛中,我们开发了一个树长短期记忆网络(tree-LSTM)模型,该模型具有几个额外的功能,包括位置特征和子树包含特征,我们还应用了集成方法。在挑战赛之后,我们对 tree-LSTM 模型应用了其他预处理步骤,并测试了另一种称为堆栈增强解析器解释器神经网络(SPINN)的递归神经网络模型的性能。我们的 tree-LSTM 模型在 BioCreative VI 挑战赛中获得了 58.53%的 F 分数。我们具有附加预处理的 tree-LSTM 模型和 SPINN 模型分别获得了 63.7%和 64.1%的 F 分数。数据库 URL:https://github.com/arwhirang/recursive_chemprot。