Schneider Michel, Tognolli Michael, Bairoch Amos
Swiss Institute of Bioinformatics, CMU, 1, Rue Michel Servet, 1211 Geneva-4, Switzerland.
Plant Physiol Biochem. 2004 Dec;42(12):1013-21. doi: 10.1016/j.plaphy.2004.10.009. Epub 2004 Dec 15.
The Swiss-Prot protein knowledgebase provides manually annotated entries for all species, but concentrates on the annotation of entries from model organisms to ensure the presence of high quality annotation of representative members of all protein families. A specific Plant Protein Annotation Program (PPAP) was started to cope with the increasing amount of data produced by the complete sequencing of plant genomes. Its main goal is the annotation of proteins from the model plant organism Arabidopsis thaliana. In addition to bibliographic references, experimental results, computed features and sometimes even contradictory conclusions, direct links to specialized databases connect amino acid sequences with the current knowledge in plant sciences. As protein families and groups of plant-specific proteins are regularly reviewed to keep up with current scientific findings, we hope that the wealth of information of Arabidopsis origin accumulated in our knowledgebase, and the numerous software tools provided on the Expert Protein Analysis System (ExPASy) web site might help to identify and reveal the function of proteins originating from other plants. Recently, a single, centralized, authoritative resource for protein sequences and functional information, UniProt, was created by joining the information contained in Swiss-Prot, Translation of the EMBL nucleotide sequence (TrEMBL), and the Protein Information Resource-Protein Sequence Database (PIR-PSD). A rising problem is that an increasing number of nucleotide sequences are not being submitted to the public databases, and thus the proteins inferred from such sequences will have difficulties finding their way to the Swiss-Prot or TrEMBL databases.
瑞士蛋白质数据库(Swiss-Prot)为所有物种提供人工注释条目,但重点是对模式生物的条目进行注释,以确保所有蛋白质家族的代表性成员都有高质量的注释。为应对植物基因组全序列测序产生的日益增长的数据量,启动了一个特定的植物蛋白质注释计划(PPAP)。其主要目标是对模式植物拟南芥中的蛋白质进行注释。除了文献参考、实验结果、计算特征,有时甚至是相互矛盾的结论外,与专业数据库的直接链接将氨基酸序列与植物科学的现有知识联系起来。由于会定期审查蛋白质家族和植物特异性蛋白质组,以跟上当前的科学发现,我们希望我们知识库中积累的源自拟南芥的丰富信息,以及专家蛋白质分析系统(ExPASy)网站上提供的众多软件工具,可能有助于识别和揭示其他植物来源蛋白质的功能。最近,通过整合瑞士蛋白质数据库(Swiss-Prot)、EMBL核苷酸序列翻译数据库(TrEMBL)和蛋白质信息资源-蛋白质序列数据库(PIR-PSD)中包含的信息,创建了一个单一的、集中的、权威的蛋白质序列和功能信息资源——通用蛋白质数据库(UniProt)。一个日益突出的问题是,越来越多的核苷酸序列未提交到公共数据库,因此从这些序列推断出的蛋白质将难以进入瑞士蛋白质数据库(Swiss-Prot)或TrEMBL数据库。