Key Laboratory of Systems Biology, SIBS-Novo Nordisk Translational Research Centre for PreDiabetes, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China.
J Am Med Inform Assoc. 2013 Jul-Aug;20(4):659-67. doi: 10.1136/amiajnl-2012-001168. Epub 2012 Sep 11.
Many methods have been developed to identify disease genes and further module biomarkers of complex diseases based on gene expression data. It is generally difficult to distinguish whether the variations in gene expression are causative or merely the effect of a disease. The limitation of relying on gene expression data alone highlights the need to develop new approaches that can explore various data to reflect the casual relationship between network modules and disease traits.
In this work, we developed a novel network-based approach to identify putative causal module biomarkers of complex diseases by integrating heterogeneous information, for example, epigenomic data, gene expression data, and protein-protein interaction network. We first formulated the identification of modules as a mathematical programming problem, which can be solved efficiently and effectively in an accurate manner. Then, we applied our approach to colorectal cancer (CRC) and identified several network modules that can serve as potential module biomarkers for characterizing CRC. Further validations using three additional gene expression datasets verified their candidate biomarker properties and the effectiveness of the method. Functional enrichment analysis also revealed that the identified modules are strongly related to hallmarks of cancer, and the enriched functions, such as inflammatory response, receptor and signaling pathways, are specific to CRC.
Through constructing a transcription factor (TF)-module network, we found that aberrant DNA methylation of genes encoding TF considerably contributes to the activity change of some genes, which may function as causal genes of CRC, and that can also be exploited to develop efficient therapies or effective drugs.
Our method can potentially be extended to the study of other complex diseases and the multiclassification problem.
已经开发出许多方法来识别疾病基因,并基于基因表达数据进一步确定复杂疾病的生物标志物模块。通常很难区分基因表达的变化是因果关系还是仅仅是疾病的影响。仅依赖基因表达数据的局限性突出表明需要开发新的方法,可以探索各种数据以反映网络模块与疾病特征之间的因果关系。
在这项工作中,我们开发了一种新的基于网络的方法,通过整合异质信息(例如,表观基因组数据、基因表达数据和蛋白质-蛋白质相互作用网络)来识别复杂疾病的潜在因果模块生物标志物。我们首先将模块的识别表示为一个数学规划问题,可以有效地、高效地、准确地解决这个问题。然后,我们将我们的方法应用于结直肠癌(CRC),并鉴定了几个网络模块,它们可以作为 CRC 特征的潜在模块生物标志物。使用另外三个基因表达数据集进行的进一步验证验证了它们作为候选生物标志物的特性和方法的有效性。功能富集分析还表明,所鉴定的模块与癌症的标志性特征密切相关,并且富集的功能,如炎症反应、受体和信号通路,是 CRC 特有的。
通过构建转录因子(TF)-模块网络,我们发现编码 TF 的基因的异常 DNA 甲基化极大地导致了一些基因活性的变化,这些基因可能作为 CRC 的因果基因发挥作用,并且可以利用这些基因来开发有效的疗法或有效的药物。
我们的方法可能会扩展到其他复杂疾病和多分类问题的研究。