Manda Srikanth Srinivas, Nirujogi Raja Sekhar, Pinto Sneha Maria, Kim Min-Sik, Datta Keshava K, Sirdeshmukh Ravi, Prasad T S Keshava, Thongboonkerd Visith, Pandey Akhilesh, Gowda Harsha
Institute of Bioinformatics, International Technology Park , Bangalore 560066, India.
J Proteome Res. 2014 Jul 3;13(7):3166-77. doi: 10.1021/pr401123v. Epub 2014 Jun 24.
Chromosome-centric human proteome project (C-HPP) is a global initiative to comprehensively characterize proteins encoded by genes across all human chromosomes by teams focusing on individual chromosomes. Here, we report mass spectrometry-based identification and characterization of proteins encoded by genes on chromosome 12. Our study is based on proteomic profiling of 30 different histologically normal human tissues and cell types using high-resolution mass spectrometry. In our analysis, we identified 1,535 proteins encoded by 836 genes on human chromosome 12. This includes 89 genes that are designated as "missing proteins" by "neXtProt" as they did not have any prior evidence either by mass spectrometry or by antibody-based detection methods. We identified several variant peptides that reflected coding SNPs annotated in dbSNP database. We also confirmed the start sites of ∼200 proteins by identifying protein N-terminal acetylated peptides. We also identified alternative start sites for 11 proteins that were not annotated in public databases until now. Most importantly, we identified 12 novel protein coding regions on chromosome 12 using our proteogenomics strategy. All of the 12 regions have been annotated as pseudogenes in public databases. This study demonstrates that there is scope for significantly improving annotation of protein coding genes in the human genome using mass-spectrometry-derived data. Individual efforts as part of C-HPP initiative should significantly contribute toward enriching human protein annotation. The data have been deposited to ProteomeXchange with identifier PXD000561.
以染色体为中心的人类蛋白质组计划(C-HPP)是一项全球倡议,由专注于单个染色体的团队全面表征所有人类染色体上基因编码的蛋白质。在此,我们报告基于质谱的对12号染色体上基因编码蛋白质的鉴定和表征。我们的研究基于使用高分辨率质谱对30种不同组织学正常的人类组织和细胞类型进行蛋白质组分析。在我们的分析中,我们鉴定出了人类12号染色体上836个基因编码的1535种蛋白质。这包括89个被“neXtProt”指定为“缺失蛋白质”的基因,因为它们此前没有通过质谱或基于抗体的检测方法获得任何证据。我们鉴定出了几种反映dbSNP数据库中注释的编码单核苷酸多态性的变异肽段。我们还通过鉴定蛋白质N端乙酰化肽段确认了约200种蛋白质的起始位点。我们还鉴定出了11种蛋白质的替代起始位点,这些位点此前在公共数据库中未被注释。最重要的是,我们使用蛋白质基因组学策略在12号染色体上鉴定出了12个新的蛋白质编码区域。在公共数据库中,这12个区域均被注释为假基因。这项研究表明,利用质谱衍生数据可显著改善人类基因组中蛋白质编码基因的注释。作为C-HPP计划一部分的个人努力应能极大地丰富人类蛋白质注释。数据已存入ProteomeXchange,标识符为PXD000561。