Computational Biology Group, Department of Clinical Laboratory Sciences, Institute of Infectious Disease and Molecular Medicine, University of Cape Town Medical School, Observatory, Cape Town 7925, South Africa.
Biomed Res Int. 2013;2013:292063. doi: 10.1155/2013/292063. Epub 2013 Sep 2.
Several approaches have been proposed for computing term information content (IC) and semantic similarity scores within the gene ontology (GO) directed acyclic graph (DAG). These approaches contributed to improving protein analyses at the functional level. Considering the recent proliferation of these approaches, a unified theory in a well-defined mathematical framework is necessary in order to provide a theoretical basis for validating these approaches. We review the existing IC-based ontological similarity approaches developed in the context of biomedical and bioinformatics fields to propose a general framework and unified description of all these measures. We have conducted an experimental evaluation to assess the impact of IC approaches, different normalization models, and correction factors on the performance of a functional similarity metric. Results reveal that considering only parents or only children of terms when assessing information content or semantic similarity scores negatively impacts the approach under consideration. This study produces a unified framework for current and future GO semantic similarity measures and provides theoretical basics for comparing different approaches. The experimental evaluation of different approaches based on different term information content models paves the way towards a solution to the issue of scoring a term's specificity in the GO DAG.
已经提出了几种方法来计算基因本体论(GO)有向无环图(DAG)内的术语信息内容(IC)和语义相似性得分。这些方法有助于提高蛋白质在功能水平上的分析。考虑到这些方法的最近激增,有必要在一个明确定义的数学框架内建立一个统一的理论,以便为验证这些方法提供理论基础。我们回顾了在生物医学和生物信息学领域中开发的基于 IC 的本体相似性方法,以提出一种通用框架和对所有这些度量的统一描述。我们进行了实验评估,以评估 IC 方法、不同的归一化模型和校正因子对功能相似性度量性能的影响。结果表明,在评估信息内容或语义相似性得分时仅考虑术语的父项或子项会对所考虑的方法产生负面影响。这项研究为当前和未来的 GO 语义相似性度量方法提供了一个统一的框架,并为比较不同方法提供了理论基础。基于不同术语信息内容模型对不同方法的实验评估为解决在 GO DAG 中对术语特异性进行评分的问题铺平了道路。