Department of Bioengineering, Stanford University, Stanford, CA 94305, USA.
J Biomed Inform. 2010 Dec;43(6):932-44. doi: 10.1016/j.jbi.2010.07.001. Epub 2010 Jul 7.
As public microarray repositories rapidly accumulate gene expression data, these resources contain increasingly valuable information about cellular processes in human biology. This presents a unique opportunity for intelligent data mining methods to extract information about the transcriptional modules underlying these biological processes. Modeling cellular gene expression as a combination of functional modules, we use independent component analysis (ICA) to derive 423 fundamental components of human biology from a 9395-array compendium of heterogeneous expression data. Annotation using the Gene Ontology (GO) suggests that while some of these components represent known biological modules, others may describe biology not well characterized by existing manually-curated ontologies. In order to understand the biological functions represented by these modules, we investigate the mechanism of the preclinical anti-cancer drug parthenolide (PTL) by analyzing the differential expression of our fundamental components. Our method correctly identifies known pathways and predicts that N-glycan biosynthesis and T-cell receptor signaling may contribute to PTL response. The fundamental gene modules we describe have the potential to provide pathway-level insight into new gene expression datasets.
随着公共基因芯片数据库迅速积累基因表达数据,这些资源包含了越来越有价值的人类生物学细胞过程信息。这为智能数据挖掘方法提供了独特的机会,可以提取这些生物过程中潜在转录模块的信息。我们将细胞基因表达建模为功能模块的组合,使用独立成分分析(ICA)从一个包含异质表达数据的 9395 个基因芯片的汇编中得出 423 个人类生物学的基本组件。使用基因本体论(GO)注释表明,虽然这些组件中的一些代表已知的生物学模块,但其他组件可能描述了现有手工整理本体论尚未很好描述的生物学。为了了解这些模块所代表的生物学功能,我们通过分析基本组件的差异表达来研究临床前抗癌药物小白菊内酯(PTL)的作用机制。我们的方法正确识别了已知途径,并预测 N-聚糖生物合成和 T 细胞受体信号可能有助于 PTL 反应。我们描述的基本基因模块有可能为新的基因表达数据集提供途径水平的见解。