Lee Eunjung, Chuang Han-Yu, Kim Jong-Won, Ideker Trey, Lee Doheon
Department of Bio and Brain Engineering, KAIST, Daejeon, South Korea.
PLoS Comput Biol. 2008 Nov;4(11):e1000217. doi: 10.1371/journal.pcbi.1000217. Epub 2008 Nov 7.
The advent of microarray technology has made it possible to classify disease states based on gene expression profiles of patients. Typically, marker genes are selected by measuring the power of their expression profiles to discriminate among patients of different disease states. However, expression-based classification can be challenging in complex diseases due to factors such as cellular heterogeneity within a tissue sample and genetic heterogeneity across patients. A promising technique for coping with these challenges is to incorporate pathway information into the disease classification procedure in order to classify disease based on the activity of entire signaling pathways or protein complexes rather than on the expression levels of individual genes or proteins. We propose a new classification method based on pathway activities inferred for each patient. For each pathway, an activity level is summarized from the gene expression levels of its condition-responsive genes (CORGs), defined as the subset of genes in the pathway whose combined expression delivers optimal discriminative power for the disease phenotype. We show that classifiers using pathway activity achieve better performance than classifiers based on individual gene expression, for both simple and complex case-control studies including differentiation of perturbed from non-perturbed cells and subtyping of several different kinds of cancer. Moreover, the new method outperforms several previous approaches that use a static (i.e., non-conditional) definition of pathways. Within a pathway, the identified CORGs may facilitate the development of better diagnostic markers and the discovery of core alterations in human disease.
微阵列技术的出现使得基于患者的基因表达谱对疾病状态进行分类成为可能。通常,通过测量标记基因表达谱区分不同疾病状态患者的能力来选择标记基因。然而,由于诸如组织样本内的细胞异质性和患者间的遗传异质性等因素,基于表达的分类在复杂疾病中可能具有挑战性。一种应对这些挑战的有前景的技术是将通路信息纳入疾病分类过程,以便基于整个信号通路或蛋白质复合物的活性而非单个基因或蛋白质的表达水平对疾病进行分类。我们提出了一种基于为每个患者推断的通路活性的新分类方法。对于每条通路,从其条件响应基因(CORGs)的基因表达水平总结出一个活性水平,CORGs被定义为通路中基因的一个子集,其组合表达为疾病表型提供最佳判别能力。我们表明,对于包括区分受干扰细胞与未受干扰细胞以及几种不同类型癌症的亚型分析在内的简单和复杂病例对照研究,使用通路活性的分类器比基于单个基因表达的分类器具有更好的性能。此外,新方法优于几种先前使用通路的静态(即非条件)定义的方法。在一条通路内,鉴定出的CORGs可能有助于开发更好的诊断标志物以及发现人类疾病中的核心改变。