一种蛋白质自动功能注释的新方法。

A novel method for automatic functional annotation of proteins.

作者信息

Fleischmann W, Möller S, Gateau A, Apweiler R

机构信息

The EMBL Outstation - The European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.

出版信息

Bioinformatics. 1999 Mar;15(3):228-33. doi: 10.1093/bioinformatics/15.3.228.

DOI:10.1093/bioinformatics/15.3.228

PMID:10222410

Abstract

MOTIVATION

To cope with the increasing amount of sequence data, reliable automatic annotation tools are required. The TrEMBL database contains together with SWISS-PROT nearly all publicly available protein sequences, but in contrast to SWISS-PROT only limited functional annotation. To improve this situation, we had to develop a method of automatic annotation that produces highly reliable functional prediction using the language and the syntax of SWISS-PROT.

RESULTS

An algorithm was developed and successfully used for the automatic annotation of a testset of unknown proteins. The predicted information included description, function, catalytic activity, cofactors, pathway, subcellular location, quaternary structure, similarity to other protein, active sites, and keywords. The algorithm showed a low coverage (10%), but a high specificity and reliability.

AVAILABILITY

The results can be obtained by anonymous ftp from ftp.ebi.ac.uk/pub/databases/sp_tr_nrdb. The source code is available on request from the authors.

摘要

动机

为了应对日益增长的序列数据量，需要可靠的自动注释工具。TrEMBL数据库与SWISS-PROT一起包含了几乎所有公开可用的蛋白质序列，但与SWISS-PROT不同的是，其功能注释有限。为改善这种情况，我们必须开发一种自动注释方法，该方法使用SWISS-PROT的语言和语法来产生高度可靠的功能预测。

结果

开发了一种算法，并成功用于对一组未知蛋白质测试集进行自动注释。预测信息包括描述、功能、催化活性、辅因子、途径、亚细胞定位、四级结构、与其他蛋白质的相似性、活性位点和关键词。该算法覆盖率较低（10%），但具有较高的特异性和可靠性。

可用性

可通过匿名ftp从ftp.ebi.ac.uk/pub/databases/sp_tr_nrdb获取结果。源代码可根据作者要求提供。

相似文献

A novel method for automatic functional annotation of proteins.

Bioinformatics. 1999 Mar;15(3):228-33. doi: 10.1093/bioinformatics/15.3.228.

Automatic rule generation for protein annotation with the C4.5 data mining algorithm applied on SWISS-PROT.

Bioinformatics. 2001 Oct;17(10):920-6. doi: 10.1093/bioinformatics/17.10.920.

Protein sequence annotation in the genome era: the annotation concept of SWISS-PROT+TREMBL.

Proc Int Conf Intell Syst Mol Biol. 1997;5:33-43.

The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000.

Nucleic Acids Res. 2000 Jan 1;28(1):45-8. doi: 10.1093/nar/28.1.45.

VARSPLIC: alternatively-spliced protein sequences derived from SWISS-PROT and TrEMBL.

Bioinformatics. 2000 Nov;16(11):1048-9. doi: 10.1093/bioinformatics/16.11.1048.

The SWISS-PROT protein sequence data bank and its supplement TrEMBL in 1999.

Nucleic Acids Res. 1999 Jan 1;27(1):49-54. doi: 10.1093/nar/27.1.49.

Removing redundancy in SWISS-PROT and TrEMBL.

Bioinformatics. 1999 Mar;15(3):258-9. doi: 10.1093/bioinformatics/15.3.258.

The InterPro database, an integrated documentation resource for protein families, domains and functional sites.

Nucleic Acids Res. 2001 Jan 1;29(1):37-40. doi: 10.1093/nar/29.1.37.

The role SWISS-PROT and TrEMBL play in the genome research environment.

J Biotechnol. 2000 Mar 31;78(3):221-34. doi: 10.1016/s0168-1656(00)00198-x.

Swissknife - 'lazy parsing' of SWISS-PROT entries.

Bioinformatics. 1999 Sep;15(9):771-2. doi: 10.1093/bioinformatics/15.9.771.

引用本文的文献

PASS: Protein Annotation Surveillance Site for Protein Annotation Using Homologous Clusters, NLP, and Sequence Similarity Networks.

Front Bioinform. 2021 Sep 29;1:749008. doi: 10.3389/fbinf.2021.749008. eCollection 2021.

Quality Matters: Biocuration Experts on the Impact of Duplication and Other Data Quality Issues in Biological Databases.

Genomics Proteomics Bioinformatics. 2020 Apr;18(2):91-103. doi: 10.1016/j.gpb.2018.11.006. Epub 2020 Jul 9.

Translational biomedical informatics in the cloud: present and future.

Biomed Res Int. 2013;2013:658925. doi: 10.1155/2013/658925. Epub 2013 Mar 17.

HAMAP in 2013, new developments in the protein family classification and annotation system.

Nucleic Acids Res. 2013 Jan;41(Database issue):D584-9. doi: 10.1093/nar/gks1157. Epub 2012 Nov 27.

Update on activities at the Universal Protein Resource (UniProt) in 2013.

Nucleic Acids Res. 2013 Jan;41(Database issue):D43-7. doi: 10.1093/nar/gks1068. Epub 2012 Nov 17.

ProtoNet 6.0: organizing 10 million protein sequences in a compact hierarchical family tree.

Nucleic Acids Res. 2012 Jan;40(Database issue):D313-20. doi: 10.1093/nar/gkr1027. Epub 2011 Nov 25.

UniProt Knowledgebase: a hub of integrated protein data.

Database (Oxford). 2011 Mar 29;2011:bar009. doi: 10.1093/database/bar009. Print 2011.

Ongoing and future developments at the Universal Protein Resource.

Nucleic Acids Res. 2011 Jan;39(Database issue):D214-9. doi: 10.1093/nar/gkq1020. Epub 2010 Nov 4.

The Universal Protein Resource (UniProt) in 2010.

Nucleic Acids Res. 2010 Jan;38(Database issue):D142-8. doi: 10.1093/nar/gkp846. Epub 2009 Oct 20.

The Universal Protein Resource (UniProt) 2009.

Nucleic Acids Res. 2009 Jan;37(Database issue):D169-74. doi: 10.1093/nar/gkn664. Epub 2008 Oct 4.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

一种蛋白质自动功能注释的新方法。

A novel method for automatic functional annotation of proteins.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY

动机

结果

可用性

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献