Sokolov Artem, Ben-Hur Asa
Department of Computer Science, Colorado State University, Fort Collins, CO 80523, USA.
J Bioinform Comput Biol. 2010 Apr;8(2):357-76. doi: 10.1142/s0219720010004744.
Protein function prediction is an active area of research in bioinformatics. Yet, the transfer of annotation on the basis of sequence or structural similarity remains widely used as an annotation method. Most of today's machine learning approaches reduce the problem to a collection of binary classification problems: whether a protein performs a particular function, sometimes with a post-processing step to combine the binary outputs. We propose a method that directly predicts a full functional annotation of a protein by modeling the structure of the Gene Ontology hierarchy in the framework of kernel methods for structured-output spaces. Our empirical results show improved performance over a BLAST nearest-neighbor method, and over algorithms that employ a collection of binary classifiers as measured on the Mousefunc benchmark dataset.
蛋白质功能预测是生物信息学中一个活跃的研究领域。然而,基于序列或结构相似性进行注释转移仍然被广泛用作一种注释方法。当今大多数机器学习方法将该问题简化为一系列二元分类问题:即一种蛋白质是否执行特定功能,有时还会有一个后处理步骤来组合二元输出。我们提出了一种方法,该方法通过在结构化输出空间的核方法框架内对基因本体层次结构进行建模,直接预测蛋白质的完整功能注释。我们的实证结果表明,与BLAST最近邻方法以及在Mousefunc基准数据集上进行测量时使用二元分类器集合的算法相比,我们的方法性能有所提高。