Ritchie Matthew E, Silver Jeremy, Oshlack Alicia, Holmes Melissa, Diyagama Dileepa, Holloway Andrew, Smyth Gordon K
Department of Oncology, University of Cambridge, CRUK Cambridge Research Institute, Li Ka Shing Centre, Robinson Way, Cambridge CB2 0RE, UK.
Bioinformatics. 2007 Oct 15;23(20):2700-7. doi: 10.1093/bioinformatics/btm412. Epub 2007 Aug 25.
Microarray data must be background corrected to remove the effects of non-specific binding or spatial heterogeneity across the array, but this practice typically causes other problems such as negative corrected intensities and high variability of low intensity log-ratios. Different estimators of background, and various model-based processing methods, are compared in this study in search of the best option for differential expression analyses of small microarray experiments.
Using data where some independent truth in gene expression is known, eight different background correction alternatives are compared, in terms of precision and bias of the resulting gene expression measures, and in terms of their ability to detect differentially expressed genes as judged by two popular algorithms, SAM and limma eBayes. A new background processing method (normexp) is introduced which is based on a convolution model. The model-based correction methods are shown to be markedly superior to the usual practice of subtracting local background estimates. Methods which stabilize the variances of the log-ratios along the intensity range perform the best. The normexp+offset method is found to give the lowest false discovery rate overall, followed by morph and vsn. Like vsn, normexp is applicable to most types of two-colour microarray data.
The background correction methods compared in this article are available in the R package limma (Smyth, 2005) from http://www.bioconductor.org.
Supplementary data are available from http://bioinf.wehi.edu.au/resources/webReferences.html.
微阵列数据必须进行背景校正,以消除整个阵列中非特异性结合或空间异质性的影响,但这种做法通常会引发其他问题,如校正后的强度为负以及低强度对数比值的高变异性。本研究比较了不同的背景估计器和各种基于模型的处理方法,以寻找小型微阵列实验差异表达分析的最佳选择。
利用已知基因表达中一些独立真实情况的数据,从所得基因表达测量值的精度和偏差,以及通过两种常用算法SAM和limma eBayes判断其检测差异表达基因的能力方面,比较了八种不同的背景校正方法。引入了一种基于卷积模型的新背景处理方法(normexp)。结果表明,基于模型的校正方法明显优于减去局部背景估计值的常规做法。在强度范围内稳定对数比值方差的方法表现最佳。发现normexp + offset方法总体上错误发现率最低,其次是morph和vsn。与vsn一样,normexp适用于大多数类型的双色微阵列数据。
本文中比较的背景校正方法可从http://www.bioconductor.org的R包limma(Smyth,2005)中获得。
补充数据可从http://bioinf.wehi.edu.au/resources/webReferences.html获得。