Leinonen Rasko, Diez Federico Garcia, Binns David, Fleischmann Wolfgang, Lopez Rodrigo, Apweiler Rolf
EMBL Outstation, The European Bioinformatics Institute (EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.
Bioinformatics. 2004 Nov 22;20(17):3236-7. doi: 10.1093/bioinformatics/bth191. Epub 2004 Mar 25.
UniProt Archive (UniParc) is the most comprehensive, non-redundant protein sequence database available. Its protein sequences are retrieved from predominant, publicly accessible resources. All new and updated protein sequences are collected and loaded daily into UniParc for full coverage. To avoid redundancy, each unique sequence is stored only once with a stable protein identifier, which can be used later in UniParc to identify the same protein in all source databases. When proteins are loaded into the database, database cross-references are created to link them to the origins of the sequences. As a result, performing a sequence search against UniParc is equivalent to performing the same search against all databases cross-referenced by UniParc. UniParc contains only protein sequences and database cross-references; all other information must be retrieved from the source databases.
通用蛋白质资源库(UniParc)是现有的最全面、无冗余的蛋白质序列数据库。其蛋白质序列从主要的、可公开访问的资源中检索。所有新的和更新的蛋白质序列每天都会被收集并加载到UniParc中以实现全面覆盖。为避免冗余,每个唯一序列仅使用一个稳定的蛋白质标识符存储一次,该标识符随后可在UniParc中用于在所有源数据库中识别同一蛋白质。当蛋白质被加载到数据库中时,会创建数据库交叉引用以将它们链接到序列的来源。因此,对UniParc进行序列搜索等同于对UniParc交叉引用的所有数据库执行相同的搜索。UniParc仅包含蛋白质序列和数据库交叉引用;所有其他信息必须从源数据库中检索。