Human Genetics, Genome Institute of Singapore, Singapore, Singapore.
School of Computer Science and Engineering, Nanyang Technological University, Singapore, Singapore.
Sci Rep. 2018 Aug 14;8(1):12100. doi: 10.1038/s41598-018-30455-0.
There exists a plethora of measures to evaluate functional similarity (FS) between genes, which is a widely used in many bioinformatics applications including detecting molecular pathways, identifying co-expressed genes, predicting protein-protein interactions, and prioritization of disease genes. Measures of FS between genes are mostly derived from Information Contents (IC) of Gene Ontology (GO) terms annotating the genes. However, existing measures evaluating IC of terms based either on the representations of terms in the annotating corpus or on the knowledge embedded in the GO hierarchy do not consider the enrichment of GO terms by the querying pair of genes. The enrichment of a GO term by a pair of gene is dependent on whether the term is annotated by one gene (i.e., partial annotation) or by both genes (i.e. complete annotation) in the pair. In this paper, we propose a method that incorporate enrichment of GO terms by a gene pair in computing their FS and show that GO enrichment improves the performances of 46 existing FS measures in the prediction of sequence homologies, gene expression correlations, protein-protein interactions, and disease associated genes.
存在大量用于评估基因功能相似性 (FS) 的方法,该方法广泛应用于许多生物信息学应用中,包括检测分子途径、识别共表达基因、预测蛋白质-蛋白质相互作用以及优先考虑疾病基因。基因之间 FS 的度量主要来自于注释基因的基因本体论 (GO) 术语的信息内容 (IC)。然而,现有的基于术语在注释语料库中的表示或 GO 层次结构中嵌入的知识来评估术语 IC 的方法并没有考虑查询基因对的 GO 术语的富集情况。GO 术语被一对基因所富集,这取决于该术语是由基因对中的一个基因注释(即部分注释)还是由两个基因注释(即完全注释)。在本文中,我们提出了一种方法,即将基因对中 GO 术语的富集纳入计算 FS 中,并表明 GO 富集提高了 46 种现有 FS 度量在预测序列同源性、基因表达相关性、蛋白质-蛋白质相互作用和疾病相关基因方面的性能。