Carroll Steven, Pavlovic Vladimir
Department of Computer Science, Rutgers University, Piscataway, NJ 08854, USA.
Bioinformatics. 2006 Aug 1;22(15):1871-8. doi: 10.1093/bioinformatics/btl187. Epub 2006 May 16.
Probabilistic graphical models have been developed in the past for the task of protein classification. In many cases, classifications obtained from the Gene Ontology have been used to validate these models. In this work we directly incorporate the structure of the Gene Ontology into the graphical representation for protein classification. We present a method in which each protein is represented by a replicate of the Gene Ontology structure, effectively modeling each protein in its own 'annotation space'. Proteins are also connected to one another according to different measures of functional similarity, after which belief propagation is run to make predictions at all ontology terms.
The proposed method was evaluated on a set of 4879 proteins from the Saccharomyces Genome Database whose interactions were also recorded in the GRID project. Results indicate that direct utilization of the Gene Ontology improves predictive ability, outperforming traditional models that do not take advantage of dependencies among functional terms. Average increase in accuracy (precision) of positive and negative term predictions of 27.8% (2.0%) over three different similarity measures and three subontologies was observed.
C/C++/Perl implementation is available from authors upon request.
过去已经开发了概率图形模型用于蛋白质分类任务。在许多情况下,从基因本体获得的分类已被用于验证这些模型。在这项工作中,我们将基因本体的结构直接纳入用于蛋白质分类的图形表示中。我们提出了一种方法,其中每个蛋白质由基因本体结构的一个副本表示,有效地在其自己的“注释空间”中对每个蛋白质进行建模。蛋白质还根据不同的功能相似性度量相互连接,然后运行信念传播以在所有本体术语上进行预测。
在一组来自酵母基因组数据库的4879个蛋白质上评估了所提出的方法,这些蛋白质的相互作用也记录在GRID项目中。结果表明,直接利用基因本体提高了预测能力,优于不利用功能术语之间依赖性的传统模型。在三种不同的相似性度量和三个子本体上,观察到正项和负项预测的准确率(精确率)平均提高了27.8%(2.0%)。
作者可应要求提供C/C++/Perl实现。