Wu Cathy H, Huang Hongzhan, Arminski Leslie, Castro-Alvear Jorge, Chen Yongxing, Hu Zhang-Zhi, Ledley Robert S, Lewis Kali C, Mewes Hans-Werner, Orcutt Bruce C, Suzek Baris E, Tsugita Akira, Vinayaka C R, Yeh Lai-Su L, Zhang Jian, Barker Winona C
National Biomedical Research Foundation, Georgetown University Medical Center, 3900 Reservoir Road, NW, Washington, DC 20007, USA.
Nucleic Acids Res. 2002 Jan 1;30(1):35-7. doi: 10.1093/nar/30.1.35.
The Protein Information Resource (PIR) serves as an integrated public resource of functional annotation of protein data to support genomic/proteomic research and scientific discovery. The PIR, in collaboration with the Munich Information Center for Protein Sequences (MIPS) and the Japan International Protein Information Database (JIPID), produces the PIR-International Protein Sequence Database (PSD), the major annotated protein sequence database in the public domain, containing about 250 000 proteins. To improve protein annotation and the coverage of experimentally validated data, a bibliography submission system is developed for scientists to submit, categorize and retrieve literature information. Comprehensive protein information is available from iProClass, which includes family classification at the superfamily, domain and motif levels, structural and functional features of proteins, as well as cross-references to over 40 biological databases. To provide timely and comprehensive protein data with source attribution, we have introduced a non-redundant reference protein database, PIR-NREF. The database consists of about 800 000 proteins collected from PIR-PSD, SWISS-PROT, TrEMBL, GenPept, RefSeq and PDB, with composite protein names and literature data. To promote database interoperability, we provide XML data distribution and open database schema, and adopt common ontologies. The PIR web site (http://pir.georgetown.edu/) features data mining and sequence analysis tools for information retrieval and functional identification of proteins based on both sequence and annotation information. The PIR databases and other files are also available by FTP (ftp://nbrfa.georgetown.edu/pir_databases).
蛋白质信息资源(PIR)是蛋白质数据功能注释的综合公共资源,以支持基因组学/蛋白质组学研究和科学发现。PIR与慕尼黑蛋白质序列信息中心(MIPS)以及日本国际蛋白质信息数据库(JIPID)合作,创建了PIR国际蛋白质序列数据库(PSD),这是公共领域中主要的带注释蛋白质序列数据库,包含约250,000种蛋白质。为了改进蛋白质注释和实验验证数据的覆盖范围,开发了一个文献提交系统,供科学家提交、分类和检索文献信息。通过iProClass可获取全面的蛋白质信息,其中包括超家族、结构域和基序水平的家族分类、蛋白质的结构和功能特征,以及与40多个生物数据库的交叉引用。为了提供具有来源归属的及时、全面的蛋白质数据,我们引入了一个非冗余参考蛋白质数据库PIR-NREF。该数据库由从PIR-PSD、SWISS-PROT、TrEMBL、GenPept、RefSeq和PDB收集的约800,000种蛋白质组成,并带有复合蛋白质名称和文献数据。为了促进数据库的互操作性,我们提供XML数据分发和开放数据库模式,并采用通用本体。PIR网站(http://pir.georgetown.edu/)具有数据挖掘和序列分析工具,可基于序列和注释信息进行蛋白质信息检索和功能鉴定。PIR数据库和其他文件也可通过FTP(ftp://nbrfa.georgetown.edu/pir_databases)获取。