Gu Jin, Chen Yang, Li Shao, Li Yanda
MOE Key Laboratory of Bioinformatics and Bioinformatics Division, Tsinghua National Laboratory for Information Science and Technology (TNLIST) and Department of Automation, Tsinghua University, Beijing 100084, China.
BMC Syst Biol. 2010 Apr 21;4:47. doi: 10.1186/1752-0509-4-47.
Cell responses to environmental stimuli are usually organized as relatively separate responsive gene modules at the molecular level. Identification of responsive gene modules rather than individual differentially expressed (DE) genes will provide important information about the underlying molecular mechanisms. Most of current methods formulate module identification as an optimization problem: find the active sub-networks in the genome-wide gene network by maximizing the objective function considering the gene differential expression and/or the gene-gene co-expression information. Here we presented a new formulation of this task: a group of closely-connected and co-expressed DE genes in the gene network are regarded as the signatures of the underlying responsive gene modules; the modules can be identified by finding the signatures and then recovering the "missing parts" by adding the intermediate genes that connect the DE genes in the gene network.
ClustEx, a two-step method based on the new formulation, was developed and applied to identify the responsive gene modules of human umbilical vein endothelial cells (HUVECs) in inflammation and angiogenesis models by integrating the time-course microarray data and genome-wide PPI data. It shows better performance than several available module identification tools by testing on the reference responsive gene sets. Gene set analysis of KEGG pathways, GO terms and microRNAs (miRNAs) target gene sets further supports the ClustEx predictions.
Taking the closely-connected and co-expressed DE genes in the condition-specific gene network as the signatures of the underlying responsive gene modules provides a new strategy to solve the module identification problem. The identified responsive gene modules of HUVECs and the corresponding enriched pathways/miRNAs provide useful resources for understanding the inflammatory and angiogenic responses of vascular systems.
细胞对环境刺激的反应在分子水平上通常组织为相对独立的反应基因模块。识别反应基因模块而非单个差异表达(DE)基因将提供有关潜在分子机制的重要信息。当前大多数方法将模块识别表述为一个优化问题:通过最大化考虑基因差异表达和/或基因-基因共表达信息的目标函数,在全基因组基因网络中找到活跃的子网络。在此,我们提出了这项任务的一种新表述:基因网络中一组紧密连接且共表达的DE基因被视为潜在反应基因模块的特征;通过找到这些特征,然后通过添加在基因网络中连接DE基因的中间基因来恢复“缺失部分”,从而识别模块。
开发了基于新表述的两步法ClustEx,并通过整合时间进程微阵列数据和全基因组蛋白质-蛋白质相互作用(PPI)数据,将其应用于识别炎症和血管生成模型中人类脐静脉内皮细胞(HUVECs)的反应基因模块。通过在参考反应基因集上进行测试,它显示出比几种现有的模块识别工具更好的性能。对KEGG通路、基因本体(GO)术语和微小RNA(miRNA)靶基因集的基因集分析进一步支持了ClustEx的预测。
将条件特异性基因网络中紧密连接且共表达的DE基因作为潜在反应基因模块的特征,为解决模块识别问题提供了一种新策略。所识别的HUVECs反应基因模块以及相应的富集通路/miRNAs为理解血管系统的炎症和血管生成反应提供了有用的资源。