Institute of Systems Biology and Bioinformatics, National Central University, Zhongli, Taiwan.
PLoS One. 2013 Jun 14;8(6):e65683. doi: 10.1371/journal.pone.0065683. Print 2013.
Significantly expressed genes extracted from microarray gene expression data have proved very useful for identifying genetic biomarkers of diseases, including cancer. However, deriving a disease related inference from a list of differentially expressed genes has proven less than straightforward. In a systems disease such as cancer, how genes interact with each other should matter just as much as the level of gene expression. Here, in a novel approach, we used the network and disease progression properties of individual genes in state-specific gene-gene interaction networks (GGINs) to select cancer genes for human colorectal cancer (CRC) and obtain a much higher hit rate of known cancer genes when compared with methods not based on network theory. We constructed GGINs by integrating gene expression microarray data from multiple states--healthy control (Nor), adenoma (Ade), inflammatory bowel disease (IBD) and CRC--with protein-protein interaction database and Gene Ontology. We tracked changes in the network degrees and clustering coefficients of individual genes in the GGINs as the disease state changed from one to another. From these we inferred the state sequences Nor-Ade-CRC and Nor-IBD-CRC both exhibited a trend of (disease) progression (ToP) toward CRC, and devised a ToP procedure for selecting cancer genes for CRC. Of the 141 candidates selected using ToP, ∼50% had literature support as cancer genes, compared to hit rates of 20% to 30% for standard methods using only gene expression data. Among the 16 candidate cancer genes that encoded transcription factors, 13 were known to be tumorigenic and three were novel: CDK1, SNRPF, and ILF2. We identified 13 of the 141 predicted cancer genes as candidate markers for early detection of CRC, 11 and 2 at the Ade and IBD states, respectively.
从基因表达微阵列数据中提取的显著表达基因已被证明非常有助于识别疾病的遗传生物标志物,包括癌症。然而,从差异表达基因列表中得出与疾病相关的推论并不简单。在癌症等系统疾病中,基因之间的相互作用与基因表达水平同样重要。在这里,我们采用了一种新方法,利用特定基因在基因-基因相互作用网络(GGIN)中的网络和疾病进展特性,选择人类结直肠癌(CRC)的癌症基因,并与不基于网络理论的方法相比,获得了更高的已知癌症基因命中率。我们通过整合来自多个状态的基因表达微阵列数据(健康对照(Nor)、腺瘤(Ade)、炎症性肠病(IBD)和 CRC)与蛋白质-蛋白质相互作用数据库和基因本体论,构建了 GGIN。我们跟踪了 GGIN 中单个基因的网络度和聚类系数的变化,因为疾病状态从一个状态变为另一个状态。从这些变化中,我们推断出从 Nor 到 Ade 再到 CRC 的状态序列以及从 Nor 到 IBD 再到 CRC 的状态序列都表现出向 CRC 进展的趋势(ToP),并为 CRC 设计了一种选择癌症基因的 ToP 程序。使用 ToP 选择的 141 个候选基因中有约 50%具有作为癌症基因的文献支持,而仅使用基因表达数据的标准方法的命中率为 20%至 30%。在被编码为转录因子的 16 个候选癌症基因中,有 13 个已知是致瘤的,3 个是新的:CDK1、SNRPF 和 ILF2。我们确定了 141 个预测的癌症基因中的 13 个作为 CRC 早期检测的候选标志物,分别在 Ade 和 IBD 状态下有 11 个和 2 个。