Shen Ying, Zhang Shaohong, Wong Hau-San, Zhang Lin
Int J Data Min Bioinform. 2014;10(1):33-48. doi: 10.1504/ijdmb.2014.062887.
Semantic similarity defined on Gene Ontology (GO) aims to provide the functional relationship between different GO terms. In this paper, a novel method, namely the Shortest Path (SP) algorithm, for measuring the semantic similarity on GO terms is proposed based on both GO structure information and the term's property. The proposed algorithm searches for the shortest path that connects two terms and uses the sum of weights on the path to estimate the semantic similarity between GO terms. A method for evaluating the nonlinear correlation between two variables is also introduced for validation. Extensive experiments conducted on the PPI dataset and two public gene expression datasets demonstrate the overall superiority of SP method over the other state-of-the-art methods evaluated.
基于基因本体论(GO)定义的语义相似性旨在提供不同GO术语之间的功能关系。本文提出了一种新的方法,即最短路径(SP)算法,用于基于GO结构信息和术语属性来度量GO术语的语义相似性。该算法搜索连接两个术语的最短路径,并使用路径上的权重之和来估计GO术语之间的语义相似性。还引入了一种评估两个变量之间非线性相关性的方法进行验证。在蛋白质-蛋白质相互作用(PPI)数据集和两个公共基因表达数据集上进行的大量实验表明,SP方法相对于其他评估的现有先进方法具有总体优势。