Computational Biology Research Lab, Department of Computer Science, National University of Computer & Emerging Sciences (NUCES-FAST), Islamabad, 44800, Pakistan.
Sci Rep. 2022 Mar 9;12(1):3818. doi: 10.1038/s41598-022-07624-3.
The Gene Ontology (GO) is a controlled vocabulary that captures the semantics or context of an entity based on its functional role. Biomedical entities are frequently compared to each other to find similarities to help in data annotation and knowledge transfer. In this study, we propose GOntoSim, a novel method to determine the functional similarity between genes. GOntoSim quantifies the similarity between pairs of GO terms, by taking the graph structure and the information content of nodes into consideration. Our measure quantifies the similarity between the ancestors of the GO terms accurately. It also takes into account the common children of the GO terms. GOntoSim is evaluated using the entire Enzyme Dataset containing 10,890 proteins and 97,544 GO annotations. The enzymes are clustered and compared with the Gold Standard EC numbers. At level 1 of the EC Numbers for Molecular Function, GOntoSim achieves a purity score of 0.75 as compared to 0.47 and 0.51 GOGO and Wang. GOntoSim can handle the noisy IEA annotations. We achieve a purity score of 0.94 in contrast to 0.48 for both GOGO and Wang at level 1 of the EC Numbers with IEA annotations. GOntoSim can be freely accessed at ( http://www.cbrlab.org/GOntoSim.html ).
GO 本体论(GO)是一种受控词汇表,它基于实体的功能角色来捕获实体的语义或上下文。为了帮助数据注释和知识转移,经常将生物医学实体相互进行比较以发现相似之处。在本研究中,我们提出了 GOntoSim,这是一种用于确定基因之间功能相似性的新方法。GOntoSim 通过考虑图结构和节点的信息量来量化 GO 术语对之间的相似性。我们的度量方法准确地量化了 GO 术语的祖先之间的相似性。它还考虑了 GO 术语的共同子节点。使用包含 10890 个蛋白质和 97544 个 GO 注释的整个酶数据集评估了 GOntoSim。对酶进行聚类并与金标准 EC 编号进行比较。在 EC 编号分子功能的第 1 级,GOntoSim 的纯度评分为 0.75,而 GOGO 和 Wang 的纯度评分为 0.47 和 0.51。GOntoSim 可以处理嘈杂的 IEA 注释。与 GOGO 和 Wang 相比,在具有 IEA 注释的 EC 编号的第 1 级,我们的纯度评分为 0.94。GOntoSim 可在(http://www.cbrlab.org/GOntoSim.html)免费访问。