Institute of Information Science, Academia Sinica, Taipei, Taiwan.
J Proteome Res. 2013 Jan 4;12(1):33-44. doi: 10.1021/pr300829r. Epub 2012 Dec 20.
Chromosome 4 is the fourth largest chromosome, containing approximately 191 megabases (~6.4% of the human genome) with 757 protein-coding genes. A number of marker genes for many diseases have been found in this chromosome, including genetic diseases (e.g., hepatocellular carcinoma) and biomedical research (cardiac system, aging, metabolic disorders, immune system, cancer and stem cell) related genes (e.g., oncogenes, growth factors). As a pilot study for the chromosome 4-centric human proteome project (Chr 4-HPP), we present here a systematic analysis of the disease association, protein isoforms, coding single nucleotide polymorphisms of these 757 protein-coding genes and their experimental evidence at the protein level. We also describe how the findings from the chromosome 4 project might be used to drive the biomarker discovery and validation study in disease-oriented projects, using the examples of secretomic and membrane proteomic approaches in cancer research. By integrating with cancer cell secretomes and several other existing databases in the public domain, we identified 141 chromosome 4-encoded proteins as cancer cell-secretable/shedable proteins. Additionally, we also identified 54 chromosome 4-encoded proteins that have been classified as cancer-associated proteins with successful selected or multiple reaction monitoring (SRM/MRM) assays developed. From literature annotation and topology analysis, 271 proteins were recognized as membrane proteins while 27.9% of the 757 proteins do not have any experimental evidence at the protein-level. In summary, the analysis revealed that the chromosome 4 is a rich resource for cancer-associated proteins for biomarker verification projects and for drug target discovery projects.
4 号染色体是人类基因组中第四大染色体,约含有 191 兆碱基(占人类基因组的 6.4%),包含 757 个蛋白质编码基因。在这条染色体上发现了许多与多种疾病相关的标记基因,包括遗传疾病(如肝细胞癌)和与生物医学研究(心脏系统、衰老、代谢紊乱、免疫系统、癌症和干细胞)相关的基因(如癌基因、生长因子)。作为以染色体 4 为中心的人类蛋白质组计划(Chr 4-HPP)的试点研究,我们在此系统地分析了这些 757 个蛋白质编码基因的疾病关联、蛋白质同工型、编码单核苷酸多态性及其蛋白质水平的实验证据。我们还描述了如何利用染色体 4 项目的发现来推动面向疾病的项目中的生物标志物发现和验证研究,以癌症研究中的分泌组学和膜蛋白质组学方法为例。通过与癌细胞分泌组和公共领域的其他几个现有数据库进行整合,我们确定了 141 个由染色体 4 编码的蛋白质为癌细胞可分泌/脱落的蛋白质。此外,我们还确定了 54 个被归类为癌症相关蛋白的染色体 4 编码蛋白,这些蛋白都已经成功开发了选择或多重反应监测(SRM/MRM)检测方法。通过文献注释和拓扑分析,我们识别出 271 个蛋白质为膜蛋白,而 757 个蛋白质中有 27.9%没有任何蛋白质水平的实验证据。总之,分析表明,染色体 4 是癌症相关蛋白生物标志物验证项目和药物靶点发现项目的丰富资源。