Wei Zhi, Li Hongzhe
Genomics and Computational Biology Graduate Group, University of Pennsylvania School of Medicine, Philadelphia, PA 19104, USA.
Bioinformatics. 2007 Jun 15;23(12):1537-44. doi: 10.1093/bioinformatics/btm129. Epub 2007 May 5.
A central problem in genomic research is the identification of genes and pathways involved in diseases and other biological processes. The genes identified or the univariate test statistics are often linked to known biological pathways through gene set enrichment analysis in order to identify the pathways involved. However, most of the procedures for identifying differentially expressed (DE) genes do not utilize the known pathway information in the phase of identifying such genes. In this article, we develop a Markov random field (MRF)-based method for identifying genes and subnetworks that are related to diseases. Such a procedure models the dependency of the DE patterns of genes on the networks using a local discrete MRF model.
Simulation studies indicated that the method is quite effective in identifying genes and subnetworks that are related to disease and has higher sensitivity and lower false discovery rates than the commonly used procedures that do not use the pathway structure information. Applications to two breast cancer microarray gene expression datasets identified several subnetworks on several of the KEGG transcriptional pathways that are related to breast cancer recurrence or survival due to breast cancer.
The proposed MRF-based model efficiently utilizes the known pathway structures in identifying the DE genes and the subnetworks that might be related to phenotype. As more biological networks are identified and documented in databases, the proposed method should find more applications in identifying the subnetworks that are related to diseases and other biological processes.
基因组研究中的一个核心问题是识别参与疾病和其他生物过程的基因及通路。通过基因集富集分析,所识别出的基因或单变量检验统计量通常与已知生物通路相关联,以便确定涉及的通路。然而,大多数识别差异表达(DE)基因的程序在识别此类基因的阶段并未利用已知的通路信息。在本文中,我们开发了一种基于马尔可夫随机场(MRF)的方法来识别与疾病相关的基因和子网络。这样的程序使用局部离散MRF模型对基因的DE模式在网络上的依赖性进行建模。
模拟研究表明,该方法在识别与疾病相关的基因和子网络方面相当有效,并且与不使用通路结构信息的常用程序相比,具有更高的灵敏度和更低的错误发现率。对两个乳腺癌微阵列基因表达数据集的应用识别出了几个KEGG转录通路上与乳腺癌复发或因乳腺癌导致的生存相关的子网络。
所提出的基于MRF的模型在识别DE基因和可能与表型相关的子网络时有效地利用了已知的通路结构。随着更多生物网络在数据库中被识别和记录,所提出的方法在识别与疾病和其他生物过程相关的子网络方面应会有更多应用。