School of Informatics and Computing, Indiana University, Bloomington, Indiana 47405, USA.
Proteins. 2011 Jul;79(7):2086-96. doi: 10.1002/prot.23029. Epub 2011 Apr 19.
Understanding protein function is one of the keys to understanding life at the molecular level. It is also important in the context of human disease because many conditions arise as a consequence of alterations of protein function. The recent availability of relatively inexpensive sequencing technology has resulted in thousands of complete or partially sequenced genomes with millions of functionally uncharacterized proteins. Such a large volume of data, combined with the lack of high-throughput experimental assays to functionally annotate proteins, attributes to the growing importance of automated function prediction. Here, we study proteins annotated by Gene Ontology (GO) terms and estimate the accuracy of functional transfer from protein sequence only. We find that the transfer of GO terms by pairwise sequence alignments is only moderately accurate, showing a surprisingly small influence of sequence identity (SID) in a broad range (30-100%). We developed and evaluated a new predictor of protein function, functional annotator (FANN), from amino acid sequence. The predictor exploits a multioutput neural network framework which is well suited to simultaneously modeling dependencies between functional terms. Experiments provide evidence that FANN-GO (predictor of GO terms; available from http://www.informatics.indiana.edu/predrag) outperforms standard methods such as transfer by global or local SID as well as GOtcha, a method that incorporates the structure of GO.
了解蛋白质的功能是理解分子水平生命的关键之一。在人类疾病的背景下,它也很重要,因为许多疾病都是由于蛋白质功能的改变而产生的。最近,相对廉价的测序技术的出现导致了数千个完整或部分测序的基因组,其中有数百万个功能未被描述的蛋白质。如此大量的数据,加上缺乏高通量的实验方法来对蛋白质进行功能注释,使得自动化功能预测的重要性日益增加。在这里,我们研究了被基因本体论(GO)术语注释的蛋白质,并仅从蛋白质序列估计功能转移的准确性。我们发现,通过两两序列比对进行 GO 术语的转移仅具有中等准确性,在广泛的范围内(30-100%)显示出序列同一性(SID)的影响非常小。我们从氨基酸序列开发并评估了一种新的蛋白质功能预测器,即功能注释器(FANN)。该预测器利用了一种多输出神经网络框架,非常适合同时对功能术语之间的依赖性进行建模。实验提供的证据表明,FANN-GO(GO 术语预测器;可从 http://www.informatics.indiana.edu/predrag 获取)优于标准方法,如全局或局部 SID 转移以及 GOtcha,后者是一种结合了 GO 结构的方法。