Fleischmann W, Möller S, Gateau A, Apweiler R
The EMBL Outstation - The European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.
Bioinformatics. 1999 Mar;15(3):228-33. doi: 10.1093/bioinformatics/15.3.228.
To cope with the increasing amount of sequence data, reliable automatic annotation tools are required. The TrEMBL database contains together with SWISS-PROT nearly all publicly available protein sequences, but in contrast to SWISS-PROT only limited functional annotation. To improve this situation, we had to develop a method of automatic annotation that produces highly reliable functional prediction using the language and the syntax of SWISS-PROT.
An algorithm was developed and successfully used for the automatic annotation of a testset of unknown proteins. The predicted information included description, function, catalytic activity, cofactors, pathway, subcellular location, quaternary structure, similarity to other protein, active sites, and keywords. The algorithm showed a low coverage (10%), but a high specificity and reliability.
The results can be obtained by anonymous ftp from ftp.ebi.ac.uk/pub/databases/sp_tr_nrdb. The source code is available on request from the authors.
为了应对日益增长的序列数据量,需要可靠的自动注释工具。TrEMBL数据库与SWISS-PROT一起包含了几乎所有公开可用的蛋白质序列,但与SWISS-PROT不同的是,其功能注释有限。为改善这种情况,我们必须开发一种自动注释方法,该方法使用SWISS-PROT的语言和语法来产生高度可靠的功能预测。
开发了一种算法,并成功用于对一组未知蛋白质测试集进行自动注释。预测信息包括描述、功能、催化活性、辅因子、途径、亚细胞定位、四级结构、与其他蛋白质的相似性、活性位点和关键词。该算法覆盖率较低(10%),但具有较高的特异性和可靠性。
可通过匿名ftp从ftp.ebi.ac.uk/pub/databases/sp_tr_nrdb获取结果。源代码可根据作者要求提供。