Zhao Zhehuan, Yang Zhihao, Luo Ling, Lin Hongfei, Wang Jian
College of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China.
Bioinformatics. 2016 Nov 15;32(22):3444-3453. doi: 10.1093/bioinformatics/btw486. Epub 2016 Jul 27.
Detecting drug-drug interaction (DDI) has become a vital part of public health safety. Therefore, using text mining techniques to extract DDIs from biomedical literature has received great attentions. However, this research is still at an early stage and its performance has much room to improve.
In this article, we present a syntax convolutional neural network (SCNN) based DDI extraction method. In this method, a novel word embedding, syntax word embedding, is proposed to employ the syntactic information of a sentence. Then the position and part of speech features are introduced to extend the embedding of each word. Later, auto-encoder is introduced to encode the traditional bag-of-words feature (sparse 0-1 vector) as the dense real value vector. Finally, a combination of embedding-based convolutional features and traditional features are fed to the softmax classifier to extract DDIs from biomedical literature. Experimental results on the DDIExtraction 2013 corpus show that SCNN obtains a better performance (an F-score of 0.686) than other state-of-the-art methods.
The source code is available for academic use at http://202.118.75.18:8080/DDI/SCNN-DDI.zip CONTACT: yangzh@dlut.edu.cnSupplementary information: Supplementary data are available at Bioinformatics online.
检测药物相互作用(DDI)已成为公共卫生安全的重要组成部分。因此,利用文本挖掘技术从生物医学文献中提取DDI受到了广泛关注。然而,这项研究仍处于早期阶段,其性能还有很大的提升空间。
在本文中,我们提出了一种基于句法卷积神经网络(SCNN)的DDI提取方法。在该方法中,提出了一种新颖的词嵌入——句法词嵌入,以利用句子的句法信息。然后引入位置和词性特征来扩展每个词的嵌入。随后,引入自动编码器将传统的词袋特征(稀疏的0-1向量)编码为密集的实值向量。最后,将基于嵌入的卷积特征和传统特征的组合输入到softmax分类器中,以从生物医学文献中提取DDI。在DDIExtraction 2013语料库上的实验结果表明,SCNN比其他现有方法具有更好的性能(F值为0.686)。
源代码可在http://202.118.75.18:8080/DDI/SCNN-DDI.zip获取,供学术使用。联系方式:yangzh@dlut.edu.cn补充信息:补充数据可在《生物信息学》在线获取。