School of Information Engineering, Zhengzhou University, Zhengzhou, China.
BMC Bioinformatics. 2022 Jan 20;23(Suppl 1):47. doi: 10.1186/s12859-022-04557-6.
Recently, with the foundation and development of gene ontology (GO) resources, numerous works have been proposed to compute functional similarity of genes and achieved series of successes in some research fields. Focusing on the calculation of the information content (IC) of terms is the main idea of these methods, which is essential for measuring functional similarity of genes. However, most approaches have some deficiencies, especially when measuring the IC of both GO terms and their corresponding annotated term sets. To this end, measuring functional similarity of genes accurately is still challenging.
In this article, we proposed a novel gene functional similarity calculation method, which especially encapsulates the specificity of terms and edges (STE). The proposed method mainly contains three steps. Firstly, a novel computing model is put forward to compute the IC of terms. This model has the ability to exploit the specific structural information of GO terms. Secondly, the IC of term sets are computed by capturing the genetic structure between the terms contained in the set. Lastly, we measure the gene functional similarity according to the IC overlap ratio of the corresponding annotated genes sets. The proposed method accurately measures the IC of not only GO terms but also the annotated term sets by leveraging the specificity of edges in the GO graph.
We conduct experiments on gene functional classification in biological pathways, gene expression datasets, and protein-protein interaction datasets. Extensive experimental results show the better performances of our proposed STE against several baseline methods.
最近,随着基因本体 (GO) 资源的建立和发展,已经提出了许多计算基因功能相似性的方法,并在一些研究领域取得了一系列成功。这些方法的主要思想是聚焦于术语的信息量 (IC) 的计算,这对于测量基因的功能相似性至关重要。然而,大多数方法都存在一些缺陷,尤其是在测量 GO 术语及其对应注释术语集的 IC 时。因此,准确测量基因的功能相似性仍然具有挑战性。
在本文中,我们提出了一种新的基因功能相似性计算方法,该方法特别封装了术语和边的特异性 (STE)。所提出的方法主要包含三个步骤。首先,提出了一种新的计算模型来计算术语的 IC。该模型具有利用 GO 术语特定结构信息的能力。其次,通过捕获集合中包含的术语之间的遗传结构来计算术语集的 IC。最后,根据相应注释基因集的 IC 重叠率来衡量基因的功能相似性。该方法通过利用 GO 图中的边的特异性,准确地测量了不仅是 GO 术语而且是注释术语集的 IC。
我们在生物途径中的基因功能分类、基因表达数据集和蛋白质-蛋白质相互作用数据集上进行了实验。广泛的实验结果表明,与几个基线方法相比,我们提出的 STE 具有更好的性能。