Breslin Thomas, Krogh Morten, Peterson Carsten, Troein Carl
Complex Systems Division, Department of Theoretical Physics, University of Lund, Sölvegatan 14A, SE-223 62 Lund, Sweden.
BMC Bioinformatics. 2005 Jun 29;6:163. doi: 10.1186/1471-2105-6-163.
Signal transduction pathways convey information from the outside of the cell to transcription factors, which in turn regulate gene expression. Our objective is to analyze tumor gene expression data from microarrays in the context of such pathways.
We use pathways compiled from the TRANSPATH/TRANSFAC databases and the literature, and three publicly available cancer microarray data sets. Variation in pathway activity, across the samples, is gauged by the degree of correlation between downstream targets of a pathway. Two correlation scores are applied; one considers all pairs of downstream targets, and the other considers only pairs without common transcription factors. Several pathways are found to be differentially active in the data sets using these scores. Moreover, we devise a score for pathway activity in individual samples, based on the average expression value of the downstream targets. Statistical significance is assigned to the scores using permutation of genes as null model. Hence, for individual samples, the status of a pathway is given as a sign, + or -, and a p-value. This approach defines a projection of high-dimensional gene expression data onto low-dimensional pathway activity scores. For each dataset and many pathways we find a much larger number of significant samples than expected by chance. Finally, we find that several sample-wise pathway activities are significantly associated with clinical classifications of the samples.
This study shows that it is feasible to infer signal transduction pathway activity, in individual samples, from gene expression data. Furthermore, these pathway activities are biologically relevant in the three cancer data sets.
信号转导通路将细胞外的信息传递给转录因子,转录因子进而调节基因表达。我们的目标是在这些通路的背景下分析来自微阵列的肿瘤基因表达数据。
我们使用从TRANSPATH/TRANSFAC数据库和文献中整理出的通路,以及三个公开可用的癌症微阵列数据集。通过通路下游靶点之间的相关程度来衡量样本间通路活性的变化。应用了两种相关评分;一种考虑所有下游靶点对,另一种只考虑没有共同转录因子的对。使用这些评分发现几个通路在数据集中具有不同的活性。此外,我们基于下游靶点的平均表达值为单个样本中的通路活性设计了一个评分。使用基因置换作为零模型为评分赋予统计显著性。因此,对于单个样本,通路的状态以正负号(+或 -)和p值给出。这种方法将高维基因表达数据投影到低维通路活性评分上。对于每个数据集和许多通路,我们发现显著样本的数量比随机预期的要多得多。最后,我们发现几个样本特异性的通路活性与样本的临床分类显著相关。
本研究表明从基因表达数据推断单个样本中的信号转导通路活性是可行的。此外,这些通路活性在三个癌症数据集中具有生物学相关性。