Gregory Alvord W, Roayaei Jean A, Quiñones Octavio A, Schneider Katherine T
Statistical Consulting Services, Data Management Services, Inc. (DMS), National Cancer Institute at Frederick (NCI-Frederick), PO Box B, Frederick, MD 21702-1201, USA.
Brief Bioinform. 2007 Nov;8(6):415-31. doi: 10.1093/bib/bbm043. Epub 2007 Sep 29.
This article describes specific procedures for conducting quality assessment of Affymetrix GeneChip(R) soybean genome data and for performing analyses to determine differential gene expression using the open-source R programming environment in conjunction with the open-source Bioconductor software. We describe procedures for extracting those Affymetrix probe set IDs related specifically to the soybean genome on the Affymetrix soybean chip and demonstrate the use of exploratory plots including images of raw probe-level data, boxplots, density plots and M versus A plots. RNA degradation and recommended procedures from Affymetrix for quality control are discussed. An appropriate probe-level model provides an excellent quality assessment tool. To demonstrate this, we discuss and display chip pseudo-images of weights, residuals and signed residuals and additional probe-level modeling plots that may be used to identify aberrant chips. The Robust Multichip Averaging (RMA) procedure was used for background correction, normalization and summarization of the AffyBatch probe-level data to obtain expression level data and to discover differentially expressed genes. Examples of boxplots and MA plots are presented for the expression level data. Volcano plots and heatmaps are used to demonstrate the use of (log) fold changes in conjunction with ordinary and moderated t-statistics for determining interesting genes. We show, with real data, how implementation of functions in R and Bioconductor successfully identified differentially expressed genes that may play a role in soybean resistance to a fungal pathogen, Phakopsora pachyrhizi. Complete source code for performing all quality assessment and statistical procedures may be downloaded from our web source: http://css.ncifcrf.gov/services/download/MicroarraySoybean.zip.
本文介绍了在开源R编程环境中结合开源Bioconductor软件对Affymetrix GeneChip(R)大豆基因组数据进行质量评估以及进行分析以确定差异基因表达的具体程序。我们描述了在Affymetrix大豆芯片上提取与大豆基因组特别相关的那些Affymetrix探针集ID的程序,并展示了探索性图的使用,包括原始探针水平数据的图像、箱线图、密度图和M对A图。讨论了RNA降解以及Affymetrix推荐的质量控制程序。一个合适的探针水平模型提供了一个出色的质量评估工具。为了证明这一点,我们讨论并展示了权重、残差和符号残差的芯片伪图像以及可用于识别异常芯片的其他探针水平建模图。稳健多芯片平均(RMA)程序用于对AffyBatch探针水平数据进行背景校正、归一化和汇总,以获得表达水平数据并发现差异表达基因。给出了表达水平数据的箱线图和MA图示例。火山图和热图用于展示如何结合普通和适度t统计量使用(对数)倍数变化来确定有趣的基因。我们用实际数据展示了R和Bioconductor中函数的实现如何成功识别可能在大豆对真菌病原体大豆锈菌的抗性中起作用的差异表达基因。执行所有质量评估和统计程序的完整源代码可从我们的网络源下载:http://css.ncifcrf.gov/services/download/MicroarraySoybean.zip。