Zhu Mingzhu, Gao Lei, Guo Zheng, Li Yanhui, Wang Dong, Wang Jing, Wang Chenguang
Department of Bioinformatics, Harbin Medical University, Harbin, 150086, China.
Gene. 2007 Apr 15;391(1-2):113-9. doi: 10.1016/j.gene.2006.12.008. Epub 2006 Dec 22.
Determining protein functions is an important task in the post-genomic era. Most of the current methods work on some large-sized functional classes selected from functional categorization systems prior to the prediction processes. GESTs, a prediction approach previously proposed by us, is based on gene expression similarity and taxonomy similarity of the functional classes. Unlike many conventional methods, it does not require pre-selecting the functional classes and can predict specific functions for genes according to the functional annotations of their co-expressed genes. In this paper, we extend this method for analyzing protein-protein interaction data. We introduce gene expression data to filter the interacting neighbors of a protein in order to enhance the degree of functional consensus among the neighbors. Using the taxonomy similarity of protein functional classes, the proposed approach can call on the interacting neighbor proteins annotated to nearby classes to support the predictions for an uncharacterized protein, and automatically select the most appropriate small-sized specific functional classes in Gene Ontology (GO) during the learning process. By three measures particularly designed for the functional classes organized in GO, we evaluate the effects of using different taxonomy similarity scores on the prediction performance. Based on the yeast protein-protein interaction data from MIPS and a dataset of gene expression profiles, we show that this method is powerful for predicting protein function to very specific terms. Compared with the other two taxonomy similarity measures used in this study, if we want to achieve higher prediction accuracy with an acceptable specific level (predicted depth), SB-TS measure proposed by us is a reasonable choice for ontology-based functional predictions.
确定蛋白质功能是后基因组时代的一项重要任务。当前大多数方法是在预测过程之前,针对从功能分类系统中选择的一些大型功能类别开展工作。GESTs是我们之前提出的一种预测方法,它基于功能类别的基因表达相似性和分类学相似性。与许多传统方法不同,它不需要预先选择功能类别,并且可以根据共表达基因的功能注释为基因预测特定功能。在本文中,我们扩展了这种方法以分析蛋白质-蛋白质相互作用数据。我们引入基因表达数据来筛选蛋白质的相互作用邻居,以提高邻居之间的功能一致性程度。利用蛋白质功能类别的分类学相似性,所提出的方法可以调用注释到附近类别的相互作用邻居蛋白质来支持对未表征蛋白质的预测,并在学习过程中自动在基因本体(GO)中选择最合适的小型特定功能类别。通过专门为GO中组织的功能类别设计的三种度量,我们评估了使用不同分类学相似性分数对预测性能的影响。基于来自MIPS的酵母蛋白质-蛋白质相互作用数据和一个基因表达谱数据集,我们表明该方法对于将蛋白质功能预测到非常具体的术语很有效。与本研究中使用的其他两种分类学相似性度量相比,如果我们想在可接受的特定水平(预测深度)下实现更高的预测准确性,我们提出的SB-TS度量是基于本体的功能预测的合理选择。