用于蛋白质组研究的独特蛋白质序列标识符数据库。

A database of unique protein sequence identifiers for proteome studies.

作者信息

Babnigg György, Giometti Carol S

机构信息

Protein Mapping Group, Biosceinces Division, Argonne National Laboratory, IL 60439, USA.

出版信息

Proteomics. 2006 Aug;6(16):4514-22. doi: 10.1002/pmic.200600032.

DOI:10.1002/pmic.200600032

PMID:16858731

Abstract

In proteome studies, identification of proteins requires searching protein sequence databases. The public protein sequence databases (e.g., NCBInr, UniProt) each contain millions of entries, and private databases add thousands more. Although much of the sequence information in these databases is redundant, each database uses distinct identifiers for the identical protein sequence and often contains unique annotation information. Users of one database obtain a database-specific sequence identifier that is often difficult to reconcile with the identifiers from a different database. When multiple databases are used for searches or the databases being searched are updated frequently, interpreting the protein identifications and associated annotations can be problematic. We have developed a database of unique protein sequence identifiers called Sequence Globally Unique Identifiers (SEGUID) derived from primary protein sequences. These identifiers serve as a common link between multiple sequence databases and are resilient to annotation changes in either public or private databases throughout the lifetime of a given protein sequence. The SEGUID Database can be downloaded (http://bioinformatics.anl.gov/SEGUID/) or easily generated at any site with access to primary protein sequence databases. Since SEGUIDs are stable, predictions based on the primary sequence information (e.g., pI, Mr) can be calculated just once; we have generated approximately 500 different calculations for more than 2.5 million sequences. SEGUIDs are used to integrate MS and 2-DE data with bioinformatics information and provide the opportunity to search multiple protein sequence databases, thereby providing a higher probability of finding the most valid protein identifications.

摘要

在蛋白质组研究中，蛋白质的鉴定需要搜索蛋白质序列数据库。公共蛋白质序列数据库（如NCBInr、UniProt）每个都包含数百万条记录，而私有数据库又增加了数千条。尽管这些数据库中的许多序列信息是冗余的，但每个数据库对相同的蛋白质序列使用不同的标识符，并且通常包含独特的注释信息。一个数据库的用户获得的是特定于该数据库的序列标识符，而该标识符往往很难与来自不同数据库的标识符进行协调。当使用多个数据库进行搜索或被搜索的数据库频繁更新时，解释蛋白质鉴定结果和相关注释可能会出现问题。我们开发了一个独特蛋白质序列标识符数据库，称为源自原始蛋白质序列的序列全局唯一标识符（SEGUID）。这些标识符充当多个序列数据库之间的通用链接，并且在给定蛋白质序列的整个生命周期内，能抵御公共或私有数据库中的注释变化。SEGUID数据库可以从（http://bioinformatics.anl.gov/SEGUID/）下载，或者在任何能够访问原始蛋白质序列数据库的站点轻松生成。由于SEGUID是稳定的，基于原始序列信息（如pI、Mr）的预测只需计算一次；我们已经为超过250万个序列生成了大约500种不同的计算结果。SEGUID用于将质谱和二维电泳数据与生物信息学信息整合起来，并有机会搜索多个蛋白质序列数据库，从而提高找到最有效蛋白质鉴定结果的概率。

相似文献

A database of unique protein sequence identifiers for proteome studies.

Proteomics. 2006 Aug;6(16):4514-22. doi: 10.1002/pmic.200600032.

A comprehensive dictionary of protein accession codes for complete protein accession identifier alias resolving.

Proteomics. 2006 Aug;6(15):4223-6. doi: 10.1002/pmic.200600018.

PROMPT: a protein mapping and comparison tool.

BMC Bioinformatics. 2006 Jul 4;7:331. doi: 10.1186/1471-2105-7-331.

EXProt--a database for EXPerimentally verified Protein functions.

In Silico Biol. 2002;2(1):1-4.

PLIPS, an automatically collected database of protein lists reported by proteomics studies.

J Proteome Res. 2009 Mar;8(3):1193-7. doi: 10.1021/pr800804d.

SMART 4.0: towards genomic data integration.

Nucleic Acids Res. 2004 Jan 1;32(Database issue):D142-4. doi: 10.1093/nar/gkh088.

Mapping PDB chains to UniProtKB entries.

Bioinformatics. 2005 Dec 1;21(23):4297-301. doi: 10.1093/bioinformatics/bti694. Epub 2005 Sep 27.

The apoptosis database.

Cell Death Differ. 2003 Jun;10(6):621-33. doi: 10.1038/sj.cdd.4401230.

MitoProteome: mitochondrial protein sequence database and annotation system.

Nucleic Acids Res. 2004 Jan 1;32(Database issue):D463-7. doi: 10.1093/nar/gkh048.

PAnnBuilder: an R package for assembling proteomic annotation data.

Bioinformatics. 2009 Apr 15;25(8):1094-5. doi: 10.1093/bioinformatics/btp100. Epub 2009 Feb 23.

引用本文的文献

Computational approaches for identifying neuropeptides: A comprehensive review.

Mol Ther Nucleic Acids. 2024 Nov 28;36(1):102409. doi: 10.1016/j.omtn.2024.102409. eCollection 2025 Mar 11.

Seq2scFv: a toolkit for the comprehensive analysis of display libraries from long-read sequencing platforms.

MAbs. 2024 Jan-Dec;16(1):2408344. doi: 10.1080/19420862.2024.2408344. Epub 2024 Oct 8.

Remote homology clustering identifies lowly conserved families of effector proteins in plant-pathogenic fungi.

Microb Genom. 2021 Sep;7(9). doi: 10.1099/mgen.0.000637.

SeqRepo: A system for managing local collections of biological sequences.

PLoS One. 2020 Dec 3;15(12):e0239883. doi: 10.1371/journal.pone.0239883. eCollection 2020.

Soil and leaf litter metaproteomics-a brief guideline from sampling to understanding.

FEMS Microbiol Ecol. 2016 Nov;92(11). doi: 10.1093/femsec/fiw180. Epub 2016 Aug 21.

Gene selection and cloning approaches for co-expression and production of recombinant protein-protein complexes.

J Struct Funct Genomics. 2015 Dec;16(3-4):113-28. doi: 10.1007/s10969-015-9200-y. Epub 2015 Dec 15.

The M5nr: a novel non-redundant database containing protein sequences and annotations from multiple sources and associated tools.

BMC Bioinformatics. 2012 Jun 21;13:141. doi: 10.1186/1471-2105-13-141.

Improvements in the Protein Identifier Cross-Reference service.

Nucleic Acids Res. 2012 Jul;40(Web Server issue):W276-80. doi: 10.1093/nar/gks338. Epub 2012 Apr 27.

iRefScape. A Cytoscape plug-in for visualization and data mining of protein interaction data from iRefIndex.

BMC Bioinformatics. 2011 Oct 5;12:388. doi: 10.1186/1471-2105-12-388.

iRefIndex: a consolidated protein interaction database with provenance.

BMC Bioinformatics. 2008 Sep 30;9:405. doi: 10.1186/1471-2105-9-405.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

用于蛋白质组研究的独特蛋白质序列标识符数据库。

A database of unique protein sequence identifiers for proteome studies.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献