Lobley A E, Nugent T, Orengo C A, Jones D T
Department of Computer Science, University College London, London WC1E 6BT, United Kingdom.
Nucleic Acids Res. 2008 Jul 1;36(Web Server issue):W297-302. doi: 10.1093/nar/gkn193. Epub 2008 May 7.
One of the challenges of the post-genomic era is to provide accurate function annotations for large volumes of data resulting from genome sequencing projects. Most function prediction servers utilize methods that transfer existing database annotations between orthologous sequences. In contrast, there are few methods that are independent of homology and can annotate distant and orphan protein sequences. The FFPred server adopts a machine-learning approach to perform function prediction in protein feature space using feature characteristics predicted from amino acid sequence. The features are scanned against a library of support vector machines representing over 300 Gene Ontology (GO) classes and probabilistic confidence scores returned for each annotation term. The GO term library has been modelled on human protein annotations; however, benchmark performance testing showed robust performance across higher eukaryotes. FFPred offers important advantages over traditional function prediction servers in its ability to annotate distant homologues and orphan protein sequences, and achieves greater coverage and classification accuracy than other feature-based prediction servers. A user may upload an amino acid and receive annotation predictions via email. Feature information is provided as easy to interpret graphics displayed on the sequence of interest, allowing for back-interpretation of the associations between features and function classes.
后基因组时代的挑战之一是为基因组测序项目产生的大量数据提供准确的功能注释。大多数功能预测服务器采用在直系同源序列之间转移现有数据库注释的方法。相比之下,几乎没有独立于同源性且能注释远缘和孤儿蛋白序列的方法。FFPred服务器采用机器学习方法,利用从氨基酸序列预测的特征特性在蛋白质特征空间中进行功能预测。将这些特征与一个代表300多个基因本体(GO)类别的支持向量机库进行比对,并为每个注释术语返回概率置信度得分。GO术语库是基于人类蛋白质注释构建的;然而,基准性能测试表明在高等真核生物中其性能稳健。FFPred在注释远缘同源物和孤儿蛋白序列方面比传统功能预测服务器具有重要优势,并且比其他基于特征的预测服务器实现了更高的覆盖率和分类准确率。用户可以上传氨基酸序列并通过电子邮件接收注释预测。特征信息以易于解释的图形形式显示在感兴趣的序列上,便于反向解读特征与功能类别之间的关联。