Mo Wen Juan, Fu Xu Ping, Han Xiao Tian, Yang Guang Yuan, Zhang Ji Gang, Guo Feng Hua, Huang Yan, Mao Yu Min, Li Yao, Xie Yi
State Key Laboratory of Genetic Engineering, Institute of Genetics, School of Life Science, Fudan University, Shanghai 200433, PR China.
BMC Genomics. 2009 Jul 29;10:340. doi: 10.1186/1471-2164-10-340.
The identification of gene differential co-expression patterns between cancer stages is a newly developing method to reveal the underlying molecular mechanisms of carcinogenesis. Most researches of this subject lack an algorithm useful for performing a statistical significance assessment involving cancer progression. Lacking this specific algorithm is apparently absent in identifying precise gene pairs correlating to cancer progression.
In this investigation we studied gene pair co-expression change by using a stochastic process model for approximating the underlying dynamic procedure of the co-expression change during cancer progression. Also, we presented a novel analytical method named 'Stochastic process model for Identifying differentially co-expressed Gene pair' (SIG method). This method has been applied to two well known prostate cancer data sets: hormone sensitive versus hormone resistant, and healthy versus cancerous. From these data sets, 428,582 gene pairs and 303,992 gene pairs were identified respectively. Afterwards, we used two different current statistical methods to the same data sets, which were developed to identify gene pair differential co-expression and did not consider cancer progression in algorithm. We then compared these results from three different perspectives: progression analysis, gene pair identification effectiveness analysis, and pathway enrichment analysis. Statistical methods were used to quantify the quality and performance of these different perspectives. They included: Re-identification Scale (RS) and Progression Score (PS) in progression analysis, True Positive Rate (TPR) in gene pair analysis, and Pathway Enrichment Score (PES) in pathway analysis. Our results show small values of RS and large values of PS, TPR, and PES; thus, suggesting that gene pairs identified by the SIG method are highly correlated with cancer progression, and highly enriched in disease-specific pathways. From this research, several gene interaction networks inferred could provide clues for the mechanism of prostate cancer progression.
The SIG method reliably identifies cancer progression correlated gene pairs, and performs well both in gene pair ontology analysis and in pathway enrichment analysis. This method provides an effective means of understanding the molecular mechanism of carcinogenesis by appropriately tracking down the process of cancer progression.
识别癌症不同阶段之间的基因差异共表达模式是一种新兴的揭示致癌潜在分子机制的方法。该领域的大多数研究缺乏一种可用于进行涉及癌症进展的统计显著性评估的算法。在识别与癌症进展相关的精确基因对时显然缺少这种特定算法。
在本研究中,我们使用一个随机过程模型来研究基因对共表达变化,该模型用于近似癌症进展过程中共表达变化的潜在动态过程。此外,我们提出了一种名为“用于识别差异共表达基因对的随机过程模型”(SIG方法)的新颖分析方法。该方法已应用于两个著名的前列腺癌数据集:激素敏感型与激素抵抗型,以及健康与癌症样本。从这些数据集中,分别识别出428,582个基因对和303,992个基因对。之后,我们将两种不同的当前统计方法应用于相同的数据集,这两种方法是为识别基因对差异共表达而开发的,并且在算法中未考虑癌症进展。然后,我们从三个不同的角度比较了这些结果:进展分析、基因对识别有效性分析和通路富集分析。使用统计方法来量化这些不同角度的质量和性能。它们包括:进展分析中的重新识别规模(RS)和进展评分(PS),基因对分析中的真阳性率(TPR),以及通路分析中的通路富集评分(PES)。我们的结果显示RS值较小,而PS、TPR和PES值较大;因此,表明通过SIG方法识别的基因对与癌症进展高度相关,并且在疾病特异性通路中高度富集。从这项研究中,推断出的几个基因相互作用网络可以为前列腺癌进展机制提供线索。
SIG方法可靠地识别与癌症进展相关的基因对,并且在基因对本体分析和通路富集分析中均表现良好。该方法通过适当地追踪癌症进展过程,为理解致癌的分子机制提供了一种有效手段。