Bidaut Ghislain, Suhre Karsten, Claverie Jean-Michel, Ochs Michael F
Fox Chase Cancer Center, 333 Cottman Avenue, Philadelphia, PA 19111, USA.
BMC Bioinformatics. 2006 Feb 28;7:99. doi: 10.1186/1471-2105-7-99.
As numerous diseases involve errors in signal transduction, modern therapeutics often target proteins involved in cellular signaling. Interpretation of the activity of signaling pathways during disease development or therapeutic intervention would assist in drug development, design of therapy, and target identification. Microarrays provide a global measure of cellular response, however linking these responses to signaling pathways requires an analytic approach tuned to the underlying biology. An ongoing issue in pattern recognition in microarrays has been how to determine the number of patterns (or clusters) to use for data interpretation, and this is a critical issue as measures of statistical significance in gene ontology or pathways rely on proper separation of genes into groups.
Here we introduce a method relying on gene annotation coupled to decompositional analysis of global gene expression data that allows us to estimate specific activity on strongly coupled signaling pathways and, in some cases, activity of specific signaling proteins. We demonstrate the technique using the Rosetta yeast deletion mutant data set, decompositional analysis by Bayesian Decomposition, and annotation analysis using ClutrFree. We determined from measurements of gene persistence in patterns across multiple potential dimensionalities that 15 basis vectors provides the correct dimensionality for interpreting the data. Using gene ontology and data on gene regulation in the Saccharomyces Genome Database, we identified the transcriptional signatures of several cellular processes in yeast, including cell wall creation, ribosomal disruption, chemical blocking of protein synthesis, and, critically, individual signatures of the strongly coupled mating and filamentation pathways.
This works demonstrates that microarray data can provide downstream indicators of pathway activity either through use of gene ontology or transcription factor databases. This can be used to investigate the specificity and success of targeted therapeutics as well as to elucidate signaling activity in normal and disease processes.
由于众多疾病涉及信号转导错误,现代治疗方法常常针对参与细胞信号传导的蛋白质。在疾病发展或治疗干预过程中对信号通路活性进行解读,将有助于药物研发、治疗方案设计以及靶点识别。微阵列提供了细胞反应的全局测量,然而将这些反应与信号通路联系起来需要一种根据潜在生物学特性进行调整的分析方法。微阵列模式识别中一个持续存在的问题是如何确定用于数据解读的模式(或簇)数量,这是一个关键问题,因为基因本体或通路中统计显著性的衡量依赖于将基因正确地分成不同组。
在此,我们介绍一种基于基因注释并结合全局基因表达数据分解分析的方法,该方法使我们能够估计强耦合信号通路的特定活性,在某些情况下,还能估计特定信号蛋白的活性。我们使用罗塞塔酵母缺失突变体数据集、贝叶斯分解进行分解分析以及使用ClutrFree进行注释分析来展示该技术。我们通过测量多个潜在维度上模式中的基因持久性确定,15个基向量为解读数据提供了正确的维度。利用基因本体和酿酒酵母基因组数据库中的基因调控数据,我们确定了酵母中几个细胞过程的转录特征,包括细胞壁形成、核糖体破坏、蛋白质合成的化学阻断,以及关键的强耦合交配和丝状化通路的个体特征。
这项工作表明,微阵列数据可以通过使用基因本体或转录因子数据库提供信号通路活性的下游指标。这可用于研究靶向治疗的特异性和成功率,以及阐明正常和疾病过程中的信号传导活性。