Department of Gastro-enterological Surgery, Graduate School of Medicine, Osaka University, 2-2 Yamadaoka E-2, Suita 565-0871, Osaka, Japan.
Int J Oncol. 2012 Feb;40(2):551-9. doi: 10.3892/ijo.2011.1244. Epub 2011 Oct 24.
In the post-genomic era, the main aim of cancer research is organizing the large amount of data on gene expression and protein abundance into a meaningful biological context. Performing integrated analysis of genomic and proteomic data sets is a challenging task. To comprehensively assess the correlation between mRNA and protein expression, we focused on the gene set enrichment analysis, a recently described powerful analytical method. When the differentially expressed proteins in 12 colorectal cancer tissue samples were considered a collective set, they exhibited significant concordance with primary tumor gene expression data in 180 colorectal cancer tissue samples. We found that 53 upregulated proteins were significantly enriched in genes exhibiting elevated gene expression levels (P<0.001, ES=0.53), indicating a positive correlation between the proteomic and transcriptomic data. Similarly, 44 downregulated proteins were significantly enriched in genes exhibiting elevated gene expression levels (P<0.001, ES -0.65). Moreover, we applied gene set enrichment analysis to identify functional genetic pathways in CRC. A relatively large number of upregulated proteins were related to the two principal pathways; ECM receptor interaction was related to heparan sulfate proteoglycan 2 and vitronectin, and ribosome to RPL13, RPL27A, RPL4, RPS18, and RPS29. In conclusion, the integrated understanding of both genomic and proteomic data sets can lead to a better understanding of functional inference at the physiological level and potential molecular targets in clinical settings.
在后基因组时代,癌症研究的主要目标是将大量的基因表达和蛋白质丰度数据组织到有意义的生物学背景中。对基因组和蛋白质数据集进行综合分析是一项具有挑战性的任务。为了全面评估 mRNA 和蛋白质表达之间的相关性,我们专注于基因集富集分析,这是一种最近描述的强大分析方法。当考虑 12 个结直肠癌组织样本中的差异表达蛋白作为一个整体时,它们与 180 个结直肠癌组织样本中的原发性肿瘤基因表达数据表现出显著的一致性。我们发现,53 个上调蛋白在表达水平升高的基因中显著富集(P<0.001,ES=0.53),表明蛋白质组学和转录组学数据之间存在正相关。同样,44 个下调蛋白在表达水平升高的基因中显著富集(P<0.001,ES=-0.65)。此外,我们应用基因集富集分析来识别 CRC 中的功能遗传途径。相对较多的上调蛋白与两个主要途径有关;ECM 受体相互作用与硫酸乙酰肝素蛋白聚糖 2 和纤连蛋白有关,核糖体与 RPL13、RPL27A、RPL4、RPS18 和 RPS29 有关。总之,对基因组和蛋白质数据集的综合理解可以更好地理解生理水平的功能推断和临床环境中的潜在分子靶点。