Department of Chemistry and Biomolecular Sciences and ARC Centre of Excellence in Bioinformatics, Macquarie University, Sydney, NSW, Australia.
J Proteome Res. 2013 Jun 7;12(6):2504-10. doi: 10.1021/pr301082p. Epub 2013 Jan 11.
The chromosome-centric human proteome project aims to systematically map all human proteins, chromosome by chromosome, in a gene-centric manner through dedicated efforts from national and international teams. This mapping will lead to a knowledge-based resource defining the full set of proteins encoded in each chromosome and laying the foundation for the development of a standardized approach to analyze the massive proteomic data sets currently being generated. The neXtProt database lists 946 proteins as the human proteome of chromosome 7. However, 170 (18%) proteins of human chromosome 7 have no evidence at the proteomic, antibody, or structural levels and are considered "missing" in this study as they lack experimental support. We have developed a protocol for the functional annotation of these "missing" proteins by integrating several bioinformatics analysis and annotation tools, sequential BLAST homology searches, protein domain/motif and gene ontology (GO) mapping, and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis. Using the BLAST search strategy, homologues for reviewed non-human mammalian proteins with protein evidence were identified for 90 "missing" proteins while another 38 had reviewed non-human mammalian homologues. Putative functional annotations were assigned to 27 of the remaining 43 novel proteins. Proteotypic peptides have been computationally generated to facilitate rapid identification of these proteins. Four of the "missing" chromosome 7 proteins have been substantiated by the ENCODE proteogenomic peptide data.
染色体中心人类蛋白质组计划旨在通过国家和国际团队的专门努力,系统地以基因为中心绘制每条染色体上的所有人类蛋白质,从而生成一个基于知识的资源,定义每个染色体编码的全套蛋白质,并为分析目前正在生成的大量蛋白质组数据集的标准化方法奠定基础。neXtProt 数据库列出了 946 种蛋白质作为人类 7 号染色体的蛋白质组。然而,人类 7 号染色体的 170 种(18%)蛋白质在蛋白质组、抗体或结构水平上没有证据,在这项研究中被认为是“缺失的”,因为它们缺乏实验支持。我们开发了一种方案,通过整合几个生物信息学分析和注释工具、连续 BLAST 同源搜索、蛋白质域/基序和基因本体 (GO) 映射以及京都基因与基因组百科全书 (KEGG) 通路分析,对这些“缺失的”蛋白质进行功能注释。使用 BLAST 搜索策略,为有蛋白质证据的已审查非人类哺乳动物蛋白质的 90 种“缺失的”蛋白质鉴定了同源物,而另外 38 种蛋白质具有已审查的非人类哺乳动物同源物。对其余 43 种新蛋白质中的 27 种赋予了推定的功能注释。已计算出蛋白质特有的肽,以方便快速鉴定这些蛋白质。ENCODE 蛋白质基因组肽数据证实了 4 种 7 号染色体缺失蛋白。