Apweiler R, Gateau A, Contrino S, Martin M J, Junker V, O'Donovan C, Lang F, Mitaritonna N, Kappus S, Bairoch A
EMBL Outstation-The European Bioinformatics Institute, Cambridge, UK.
Proc Int Conf Intell Syst Mol Biol. 1997;5:33-43.
SWISS-PROT is a curated protein sequence database which strives to provide a high level of annotation, a minimal level of redundancy and high level of integration with other databases. Ongoing genome sequencing projects have dramatically increased the number of protein sequences to be incorporated into SWISS-PROT. Since we do not want to dilute the quality standards of SWISS-PROT by incorporating sequences without proper sequence analysis and annotation, we cannot speed up the incorporation of new incoming data indefinitely. However, as we also want to make the sequences available as fast as possible, we introduced TREMBL (TRanslation of EMBL nucleotide sequence database), a supplement to SWISS-PROT. TREMBL consists of computer-annotated entries in SWISS-PROT format derived from the translation of all coding sequences (CDS) in the EMBL nucleotide sequence database, except for CDS already included in SWISS-PROT. While TREMBL is already of immense value, its computer-generated annotation does not match the quality of SWISS-PROTs. The main difference is in the protein functional information attached to sequences. With this in mind, we are dedicating substantial effort to develop and apply computer methods to enhance the functional information attached to TREMBL entries.
SWISS-PROT是一个经过精心整理的蛋白质序列数据库,致力于提供高水平的注释、最低限度的冗余以及与其他数据库的高度整合。正在进行的基因组测序项目极大地增加了要纳入SWISS-PROT的蛋白质序列数量。由于我们不想通过纳入未经适当序列分析和注释的序列来稀释SWISS-PROT的质量标准,所以我们不能无限制地加快新输入数据的纳入速度。然而,由于我们也希望尽快提供这些序列,我们引入了TREMBL(EMBL核苷酸序列数据库的翻译),作为SWISS-PROT的补充。TREMBL由以SWISS-PROT格式计算机注释的条目组成,这些条目源自EMBL核苷酸序列数据库中所有编码序列(CDS)的翻译,但不包括已包含在SWISS-PROT中的CDS。虽然TREMBL已经具有巨大价值,但其计算机生成的注释与SWISS-PROT的质量不匹配。主要区别在于附加到序列上的蛋白质功能信息。考虑到这一点,我们正在投入大量精力开发和应用计算机方法,以增强附加到TREMBL条目的功能信息。