Sioson Allan A, Mane Shrinivasrao P, Li Pinghua, Sha Wei, Heath Lenwood S, Bohnert Hans J, Grene Ruth
Department of Computer Science, Virginia Tech, Blacksburg, USA.
BMC Bioinformatics. 2006 Apr 20;7:215. doi: 10.1186/1471-2105-7-215.
Analysis of DNA microarray data takes as input spot intensity measurements from scanner software and returns differential expression of genes between two conditions, together with a statistical significance assessment. This process typically consists of two steps: data normalization and identification of differentially expressed genes through statistical analysis. The Expresso microarray experiment management system implements these steps with a two-stage, log-linear ANOVA mixed model technique, tailored to individual experimental designs. The complement of tools in TM4, on the other hand, is based on a number of preset design choices that limit its flexibility. In the TM4 microarray analysis suite, normalization, filter, and analysis methods form an analysis pipeline. TM4 computes integrated intensity values (IIV) from the average intensities and spot pixel counts returned by the scanner software as input to its normalization steps. By contrast, Expresso can use either IIV data or median intensity values (MIV). Here, we compare Expresso and TM4 analysis of two experiments and assess the results against qRT-PCR data.
The Expresso analysis using MIV data consistently identifies more genes as differentially expressed, when compared to Expresso analysis with IIV data. The typical TM4 normalization and filtering pipeline corrects systematic intensity-specific bias on a per microarray basis. Subsequent statistical analysis with Expresso or a TM4 t-test can effectively identify differentially expressed genes. The best agreement with qRT-PCR data is obtained through the use of Expresso analysis and MIV data.
The results of this research are of practical value to biologists who analyze microarray data sets. The TM4 normalization and filtering pipeline corrects microarray-specific systematic bias and complements the normalization stage in Expresso analysis. The results of Expresso using MIV data have the best agreement with qRT-PCR results. In one experiment, MIV is a better choice than IIV as input to data normalization and statistical analysis methods, as it yields as greater number of statistically significant differentially expressed genes; TM4 does not support the choice of MIV input data. Overall, the more flexible and extensive statistical models of Expresso achieve more accurate analytical results, when judged by the yardstick of qRT-PCR data, in the context of an experimental design of modest complexity.
DNA微阵列数据分析以扫描仪软件的斑点强度测量值作为输入,并返回两种条件下基因的差异表达以及统计显著性评估。这个过程通常包括两个步骤:数据归一化和通过统计分析识别差异表达基因。Expresso微阵列实验管理系统采用两阶段对数线性方差分析混合模型技术来实现这些步骤,该技术是针对个体实验设计量身定制的。另一方面,TM4中的工具集基于一些预设的设计选择,这限制了其灵活性。在TM4微阵列分析套件中,归一化、过滤和分析方法构成了一个分析流程。TM4从扫描仪软件返回的平均强度和斑点像素计数计算综合强度值(IIV),作为其归一化步骤的输入。相比之下,Expresso既可以使用IIV数据,也可以使用中位数强度值(MIV)。在这里,我们比较了Expresso和TM4对两个实验的分析,并根据qRT-PCR数据评估结果。
与使用IIV数据的Expresso分析相比,使用MIV数据的Expresso分析始终能识别出更多差异表达的基因。典型的TM4归一化和过滤流程在每个微阵列的基础上校正系统强度特异性偏差。随后使用Expresso或TM4 t检验进行统计分析可以有效地识别差异表达基因。通过使用Expresso分析和MIV数据,与qRT-PCR数据的一致性最佳。
本研究结果对分析微阵列数据集的生物学家具有实际价值。TM4归一化和过滤流程校正了微阵列特异性系统偏差,并补充了Expresso分析中的归一化阶段。使用MIV数据的Expresso结果与qRT-PCR结果的一致性最佳。在一个实验中,作为数据归一化和统计分析方法的输入,MIV比IIV是更好的选择,因为它能产生更多具有统计学显著性的差异表达基因;TM4不支持选择MIV输入数据。总体而言,以qRT-PCR数据为标准,在适度复杂的实验设计背景下,Expresso更灵活、更广泛的统计模型能获得更准确的分析结果。