Alvarez Marco A, Yan Changhui
Department of Computer Science, Utah State University, Logan, Utah 84322, USA.
J Bioinform Comput Biol. 2011 Dec;9(6):681-95. doi: 10.1142/s0219720011005641.
Existing methods for calculating semantic similarities between pairs of Gene Ontology (GO) terms and gene products often rely on external databases like Gene Ontology Annotation (GOA) that annotate gene products using the GO terms. This dependency leads to some limitations in real applications. Here, we present a semantic similarity algorithm (SSA), that relies exclusively on the GO. When calculating the semantic similarity between a pair of input GO terms, SSA takes into account the shortest path between them, the depth of their nearest common ancestor, and a novel similarity score calculated between the definitions of the involved GO terms. In our work, we use SSA to calculate semantic similarities between pairs of proteins by combining pairwise semantic similarities between the GO terms that annotate the involved proteins. The reliability of SSA was evaluated by comparing the resulting semantic similarities between proteins with the functional similarities between proteins derived from expert annotations or sequence similarity. Comparisons with existing state-of-the-art methods showed that SSA is highly competitive with the other methods. SSA provides a reliable measure for semantics similarity independent of external databases of functional-annotation observations.
用于计算基因本体论(GO)术语对和基因产物之间语义相似性的现有方法通常依赖于外部数据库,如基因本体论注释(GOA),该数据库使用GO术语对基因产物进行注释。这种依赖性在实际应用中导致了一些局限性。在此,我们提出一种语义相似性算法(SSA),它仅依赖于基因本体论。在计算一对输入GO术语之间的语义相似性时,SSA会考虑它们之间的最短路径、最近共同祖先的深度,以及在所涉及的GO术语定义之间计算的一种新颖的相似性得分。在我们的工作中,我们通过结合注释所涉及蛋白质的GO术语之间的成对语义相似性,使用SSA来计算蛋白质对之间的语义相似性。通过将蛋白质之间产生的语义相似性与源自专家注释或序列相似性的蛋白质功能相似性进行比较,评估了SSA的可靠性。与现有最先进方法的比较表明,SSA与其他方法具有高度竞争力。SSA提供了一种独立于功能注释观察外部数据库的可靠语义相似性度量。