Mazandu Gaston K, Mulder Nicola J
Computational Biology Group, Department of Clinical Laboratory Sciences, Institute of Infectious Disease and Molecular Medicine, University of Cape Town, Cape Town 7925, South Africa.
Adv Bioinformatics. 2012;2012:975783. doi: 10.1155/2012/975783. Epub 2012 May 15.
The wide coverage and biological relevance of the Gene Ontology (GO), confirmed through its successful use in protein function prediction, have led to the growth in its popularity. In order to exploit the extent of biological knowledge that GO offers in describing genes or groups of genes, there is a need for an efficient, scalable similarity measure for GO terms and GO-annotated proteins. While several GO similarity measures exist, none adequately addresses all issues surrounding the design and usage of the ontology. We introduce a new metric for measuring the distance between two GO terms using the intrinsic topology of the GO-DAG, thus enabling the measurement of functional similarities between proteins based on their GO annotations. We assess the performance of this metric using a ROC analysis on human protein-protein interaction datasets and correlation coefficient analysis on the selected set of protein pairs from the CESSM online tool. This metric achieves good performance compared to the existing annotation-based GO measures. We used this new metric to assess functional similarity between orthologues, and show that it is effective at determining whether orthologues are annotated with similar functions and identifying cases where annotation is inconsistent between orthologues.
基因本体论(Gene Ontology,GO)的广泛覆盖范围和生物学相关性,已通过其在蛋白质功能预测中的成功应用得到证实,这使其越来越受欢迎。为了利用GO在描述基因或基因组时所提供的生物学知识范围,需要一种针对GO术语和带有GO注释的蛋白质的高效、可扩展的相似性度量方法。虽然存在几种GO相似性度量方法,但没有一种能充分解决围绕本体设计和使用的所有问题。我们引入了一种新的度量标准,利用GO有向无环图(GO-DAG)的内在拓扑结构来测量两个GO术语之间的距离,从而能够基于蛋白质的GO注释来测量它们之间的功能相似性。我们使用人类蛋白质-蛋白质相互作用数据集上的ROC分析以及来自CESSM在线工具的选定蛋白质对集上的相关系数分析来评估该度量标准的性能。与现有的基于注释的GO度量方法相比,该度量标准表现良好。我们使用这种新的度量标准来评估直系同源物之间的功能相似性,并表明它在确定直系同源物是否具有相似功能注释以及识别直系同源物之间注释不一致的情况方面是有效的。