Dalian University of Technology, Dalian, China.
IEEE/ACM Trans Comput Biol Bioinform. 2012 Jul-Aug;9(4):1190-202. doi: 10.1109/TCBB.2012.50.
Extracting protein-protein interaction (PPI) from biomedical literature is an important task in biomedical text mining (BioTM). In this paper, we propose a hash subgraph pairwise (HSP) kernel-based approach for this task. The key to the novel kernel is to use the hierarchical hash labels to express the structural information of subgraphs in a linear time. We apply the graph kernel to compute dependency graphs representing the sentence structure for protein-protein interaction extraction task, which can efficiently make use of full graph structural information, and particularly capture the contiguous topological and label information ignored before. We evaluate the proposed approach on five publicly available PPI corpora. The experimental results show that our approach significantly outperforms all-path kernel approach on all five corpora and achieves state-of-the-art performance.
从生物医学文献中提取蛋白质-蛋白质相互作用(PPI)是生物医学文本挖掘(BioTM)中的一项重要任务。在本文中,我们提出了一种基于哈希子图成对(HSP)核的方法来解决这个问题。该核的关键是使用分层哈希标签来在线性时间内表示子图的结构信息。我们将图核应用于计算表示蛋白质-蛋白质相互作用提取任务的句子结构的依赖图,这可以有效地利用完整的图结构信息,并特别捕捉到之前忽略的连续拓扑和标签信息。我们在五个公开可用的 PPI 语料库上评估了所提出的方法。实验结果表明,在所有五个语料库上,我们的方法都明显优于全路径核方法,并达到了最先进的性能。