Sonachalam Madhankumar, Shen Jeffrey, Huang Hui, Wu Xiaogang
School of Informatics, Indiana University Indianapolis, IN, USA.
Front Genet. 2012 May 17;3:80. doi: 10.3389/fgene.2012.00080. eCollection 2012.
In this work, we integrated prior knowledge from gene signatures and protein interactions with gene set enrichment analysis (GSEA), and gene/protein network modeling together to identify gene network signatures from gene expression microarray data. We demonstrated how to apply this approach into discovering gene network signatures for colorectal cancer (CRC) from microarray datasets. First, we used GSEA to analyze the microarray data through enriching differential genes in different CRC-related gene sets from two publicly available up-to-date gene set databases - Molecular Signatures Database (MSigDB) and Gene Signatures Database (GeneSigDB). Second, we compared the enriched gene sets through enrichment score, false-discovery rate, and nominal p-value. Third, we constructed an integrated protein-protein interaction (PPI) network through connecting these enriched genes by high-quality interactions from a human annotated and predicted protein interaction database, with a confidence score labeled for each interaction. Finally, we mapped differential gene expressions onto the constructed network to build a comprehensive network model containing visualized transcriptome and proteome data. The results show that although MSigDB has more CRC-relevant gene sets than GeneSigDB, the integrated PPI network connecting the enriched genes from both MSigDB and GeneSigDB can provide a more complete view for discovering gene network signatures. We also found several important sub-network signatures for CRC, such as TP53 sub-network, PCNA sub-network, and IL8 sub-network, corresponding to apoptosis, DNA repair, and immune response, respectively.
在这项工作中,我们将来自基因特征和蛋白质相互作用的先验知识与基因集富集分析(GSEA)以及基因/蛋白质网络建模相结合,以从基因表达微阵列数据中识别基因网络特征。我们展示了如何将这种方法应用于从微阵列数据集中发现结直肠癌(CRC)的基因网络特征。首先,我们使用GSEA通过在两个公开可用的最新基因集数据库——分子特征数据库(MSigDB)和基因特征数据库(GeneSigDB)中富集不同CRC相关基因集中的差异基因来分析微阵列数据。其次,我们通过富集分数、错误发现率和名义p值比较富集的基因集。第三,我们通过来自人类注释和预测的蛋白质相互作用数据库的高质量相互作用连接这些富集的基因构建了一个综合蛋白质-蛋白质相互作用(PPI)网络,并为每个相互作用标记了置信度分数。最后,我们将差异基因表达映射到构建的网络上,以构建一个包含可视化转录组和蛋白质组数据的综合网络模型。结果表明,尽管MSigDB比GeneSigDB拥有更多与CRC相关的基因集,但连接来自MSigDB和GeneSigDB的富集基因的综合PPI网络可以为发现基因网络特征提供更完整的视角。我们还发现了几个重要的CRC子网络特征,如TP53子网络、PCNA子网络和IL8子网络,分别对应细胞凋亡、DNA修复和免疫反应。