School of Computer Science, National University of Defense Technology, Changsha, 410073, China.
BMC Bioinformatics. 2017 Dec 28;18(Suppl 16):578. doi: 10.1186/s12859-017-1962-8.
Drug-drug interaction extraction (DDI) needs assistance from automated methods to address the explosively increasing biomedical texts. In recent years, deep neural network based models have been developed to address such needs and they have made significant progress in relation identification.
We propose a dependency-based deep neural network model for DDI extraction. By introducing the dependency-based technique to a bi-directional long short term memory network (Bi-LSTM), we build three channels, namely, Linear channel, DFS channel and BFS channel. All of these channels are constructed with three network layers, including embedding layer, LSTM layer and max pooling layer from bottom up. In the embedding layer, we extract two types of features, one is distance-based feature and another is dependency-based feature. In the LSTM layer, a Bi-LSTM is instituted in each channel to better capture relation information. Then max pooling is used to get optimal features from the entire encoding sequential data. At last, we concatenate the outputs of all channels and then link it to the softmax layer for relation identification.
To the best of our knowledge, our model achieves new state-of-the-art performance with the F-score of 72.0% on the DDIExtraction 2013 corpus. Moreover, our approach obtains much higher Recall value compared to the existing methods.
The dependency-based Bi-LSTM model can learn effective relation information with less feature engineering in the task of DDI extraction. Besides, the experimental results show that our model excels at balancing the Precision and Recall values.
药物-药物相互作用提取(DDI)需要借助自动化方法来解决生物医学文本的爆炸式增长。近年来,基于深度神经网络的模型已经被开发出来以满足这种需求,并在关系识别方面取得了显著的进展。
我们提出了一种基于依赖关系的深度神经网络模型来进行 DDI 提取。通过将基于依赖关系的技术引入到双向长短期记忆网络(Bi-LSTM)中,我们构建了三个通道,分别是线性通道、DFS 通道和 BFS 通道。所有这些通道都由三个网络层构成,包括从下到上的嵌入层、LSTM 层和最大池化层。在嵌入层中,我们提取了两种类型的特征,一种是基于距离的特征,另一种是基于依赖关系的特征。在 LSTM 层中,在每个通道中建立了一个 Bi-LSTM,以更好地捕获关系信息。然后使用最大池化层从整个编码序列数据中获取最优特征。最后,我们将所有通道的输出连接起来,并将其连接到 softmax 层进行关系识别。
据我们所知,我们的模型在 DDIExtraction 2013 语料库上取得了新的最先进的性能,F1 得分为 72.0%。此外,与现有方法相比,我们的方法获得了更高的召回值。
基于依赖关系的 Bi-LSTM 模型可以在 DDI 提取任务中学习到有效的关系信息,同时减少特征工程的工作量。此外,实验结果表明,我们的模型在平衡精度和召回率方面表现出色。