Department of Cellular & Molecular Medicine, The Panum Institute, University of Copenhagen, Copenhagen, Denmark.
PLoS One. 2012;7(2):e32394. doi: 10.1371/journal.pone.0032394. Epub 2012 Feb 27.
Analyzing data obtained from genome-wide gene expression experiments is challenging due to the quantity of variables, the need for multivariate analyses, and the demands of managing large amounts of data. Here we present the R package pcaGoPromoter, which facilitates the interpretation of genome-wide expression data and overcomes the aforementioned problems. In the first step, principal component analysis (PCA) is applied to survey any differences between experiments and possible groupings. The next step is the interpretation of the principal components with respect to both biological function and regulation by predicted transcription factor binding sites. The robustness of the results is evaluated using cross-validation, and illustrative plots of PCA scores and gene ontology terms are available. pcaGoPromoter works with any platform that uses gene symbols or Entrez IDs as probe identifiers. In addition, support for several popular Affymetrix GeneChip platforms is provided. To illustrate the features of the pcaGoPromoter package a serum stimulation experiment was performed and the genome-wide gene expression in the resulting samples was profiled using the Affymetrix Human Genome U133 Plus 2.0 chip. Array data were analyzed using pcaGoPromoter package tools, resulting in a clear separation of the experiments into three groups: controls, serum only and serum with inhibitor. Functional annotation of the axes in the PCA score plot showed the expected serum-promoted biological processes, e.g., cell cycle progression and the predicted involvement of expected transcription factors, including E2F. In addition, unexpected results, e.g., cholesterol synthesis in serum-depleted cells and NF-κB activation in inhibitor treated cells, were noted. In summary, the pcaGoPromoter R package provides a collection of tools for analyzing gene expression data. These tools give an overview of the input data via PCA, functional interpretation by gene ontology terms (biological processes), and an indication of the involvement of possible transcription factors.
分析全基因组基因表达实验获得的数据具有挑战性,这是因为存在大量变量,需要进行多元分析,并且需要管理大量数据。这里我们介绍 R 包 pcaGoPromoter,它可以帮助解释全基因组表达数据,并克服上述问题。在第一步中,应用主成分分析(PCA)来调查实验之间的任何差异和可能的分组。下一步是解释主成分与生物功能和预测转录因子结合位点的调节有关。使用交叉验证评估结果的稳健性,并提供 PCA 得分和基因本体论术语的说明性图。pcaGoPromoter 可与任何使用基因符号或 Entrez ID 作为探针标识符的平台配合使用。此外,还提供了对几种流行的 Affymetrix GeneChip 平台的支持。为了说明 pcaGoPromoter 包的功能,我们进行了血清刺激实验,并使用 Affymetrix Human Genome U133 Plus 2.0 芯片对产生的样品进行了全基因组基因表达谱分析。使用 pcaGoPromoter 包工具分析阵列数据,导致实验清晰地分为三组:对照、仅血清和含抑制剂的血清。PCA 得分图中轴的功能注释显示了预期的血清促进的生物学过程,例如细胞周期进展和预期转录因子的预测参与,包括 E2F。此外,还注意到了一些意想不到的结果,例如血清耗竭细胞中的胆固醇合成和抑制剂处理细胞中的 NF-κB 激活。总之,pcaGoPromoter R 包提供了一组用于分析基因表达数据的工具。这些工具通过 PCA 提供输入数据的概述,通过基因本体论术语(生物学过程)进行功能解释,并指示可能的转录因子的参与。