Jiang Zhenchao, Li Lishuang, Huang Degen
IEEE/ACM Trans Comput Biol Bioinform. 2016 Jul-Aug;13(4):634-42. doi: 10.1109/TCBB.2015.2478467. Epub 2015 Sep 14.
In biomedical text mining tasks, distributed word representation has succeeded in capturing semantic regularities, but most of them are shallow-window based models, which are not sufficient for expressing the meaning of words. To represent words using deeper information, we make explicit the semantic regularity to emerge in word relations, including dependency relations and context relations, and propose a novel architecture for computing continuous vector representation by leveraging those relations. The performance of our model is measured on word analogy task and Protein-Protein Interaction Extraction (PPIE) task. Experimental results show that our method performs overall better than other word representation models on word analogy task and have many advantages on biomedical text mining.
在生物医学文本挖掘任务中,分布式词表示已成功捕捉到语义规律,但其中大多数是基于浅窗口的模型,不足以表达词的含义。为了使用更深层次的信息来表示词,我们明确了在词关系(包括依存关系和上下文关系)中出现的语义规律,并提出了一种新颖的架构,通过利用这些关系来计算连续向量表示。我们的模型在词类比任务和蛋白质-蛋白质相互作用提取(PPIE)任务上进行了性能评估。实验结果表明,我们的方法在词类比任务上总体表现优于其他词表示模型,并且在生物医学文本挖掘方面具有许多优势。