Hua Jianping, Balagurunathan Yoganand, Chen Yidong, Lowey James, Bittner Michael L, Xiong Zixiang, Suh Edward, Dougherty Edward R
Computational Biology Division, Translational Genomics Research Institute, Phoenix, AZ 85004, USA.
EURASIP J Bioinform Syst Biol. 2006;2006(1):43056. doi: 10.1155/BSB/2006/43056.
When using cDNA microarrays, normalization to correct labeling bias is a common preliminary step before further data analysis is applied, its objective being to reduce the variation between arrays. To date, assessment of the effectiveness of normalization has mainly been confined to the ability to detect differentially expressed genes. Since a major use of microarrays is the expression-based phenotype classification, it is important to evaluate microarray normalization procedures relative to classification. Using a model-based approach, we model the systemic-error process to generate synthetic gene-expression values with known ground truth. These synthetic expression values are subjected to typical normalization methods and passed through a set of classification rules, the objective being to carry out a systematic study of the effect of normalization on classification. Three normalization methods are considered: offset, linear regression, and Lowess regression. Seven classification rules are considered: 3-nearest neighbor, linear support vector machine, linear discriminant analysis, regular histogram, Gaussian kernel, perceptron, and multiple perceptron with majority voting. The results of the first three are presented in the paper, with the full results being given on a complementary website. The conclusion from the different experiment models considered in the study is that normalization can have a significant benefit for classification under difficult experimental conditions, with linear and Lowess regression slightly outperforming the offset method.
在使用cDNA微阵列时,在进行进一步数据分析之前,进行归一化以校正标记偏差是常见的初步步骤,其目的是减少阵列之间的差异。迄今为止,对归一化有效性的评估主要局限于检测差异表达基因的能力。由于微阵列的主要用途是基于表达的表型分类,因此相对于分类来评估微阵列归一化程序很重要。我们使用基于模型的方法,对系统误差过程进行建模,以生成具有已知真实情况的合成基因表达值。这些合成表达值经过典型的归一化方法处理,并通过一组分类规则,目的是对归一化对分类的影响进行系统研究。我们考虑了三种归一化方法:偏移、线性回归和局部加权散点平滑回归(Lowess回归)。我们考虑了七种分类规则:3-最近邻、线性支持向量机、线性判别分析、常规直方图、高斯核、感知器以及采用多数投票的多重感知器。本文给出了前三种方法的结果,完整结果在一个补充网站上提供。该研究中考虑的不同实验模型得出的结论是,在困难的实验条件下,归一化对分类可能有显著益处,线性回归和局部加权散点平滑回归略优于偏移方法。