Parrish Rudolph S, Spencer Horace J
Department of Bioinformatics and Biostatistics, School of Public Health and Information Sciences, University of Louisville, Louisville, KY 42092, USA.
J Biopharm Stat. 2004 Aug;14(3):575-89. doi: 10.1081/BIP-200025650.
Normalization techniques are used to reduce variation among gene expression measurements in oligonucleotide microarrays in an effort to improve the quality of the data and the power of significance tests for detecting differential expression. Of several such proposed methods, two that have commonly been employed include median-interquartile range normalization and quantile normalization. The median-IQR method applied directly to fold-changes for paired data also was considered. Two methods for calculating gene expression values include the MAS 5.0 algorithm [Affymetrix. (2002). Statistical Algorithms Description Document. Santa Clara, CA: Affymetrix, Inc. http://www.affymetrix.com/support/technical/whitepapers/sadd-whitepaper.pdf] and the RMA method [Irizarry, R. A., Bolstad, B. M., Collin, F., Cope, L. M., Hobbs, B., Speed, T. P. (2003a). Summaries of Affymetrix GeneChip probe level data. Nucleic Acids Res. 31(4,e15); Irizarry, R. A., Hobbs, B., Collin, F., Beazer-Barclay, Y. D., Antonellis, K. J., Scherf, U., Speed, T. P. (2003b). Exploration, normalization, and summaries of high density oligonucleotide array probe-level data. Biostatistics 4(2):249-264; Irizarry, R. A., Gautier, L., Cope, L. (2003c). An R package for analysis of Affymetrix oligonucleotide arrays. In: Parmigiani, R. I. G., Garrett, E. S., Ziegler, S., eds. The Analysis of Gene Expression Data: Methods and Software. Berlin: Springer, pp. 102-119].
In considering these methods applied to a prostate cancer data set derived from paired samples on normal and tumor tissue, it is shown that normalization methods may lead to substantial inflation of the number of genes identified by paired-t significance tests even after adjustment for multiple testing. This is shown to be due primarily to an unintended effect that normalization has on the experimental error variance. The impact appears to be greater in the RMA method compared to the MAS 5.0 algorithm and for quantile normalization compared to median-IQR normalization.
标准化技术用于减少寡核苷酸微阵列中基因表达测量值之间的差异,以提高数据质量和检测差异表达的显著性检验的功效。在几种提出的此类方法中,常用的两种方法包括中位数-四分位距标准化和分位数标准化。还考虑了直接应用于配对数据的倍数变化的中位数-四分位距方法。计算基因表达值的两种方法包括MAS 5.0算法[Affymetrix。(2002年)。统计算法描述文档。加利福尼亚州圣克拉拉:Affymetrix公司。http://www.affymetrix.com/support/technical/whitepapers/sadd-whitepaper.pdf]和RMA方法[Irizarry,R.A.,Bolstad,B.M.,Collin,F.,Cope,L.M.,Hobbs,B.,Speed,T.P.(2003a)。Affymetrix基因芯片探针水平数据的总结。核酸研究。31(4,e15);Irizarry,R.A.,Hobbs,B.,Collin,F.,Beazer-Barclay,Y.D.,Antonellis,K.J.,Scherf,U.,Speed,T.P.(2003b)。高密度寡核苷酸阵列探针水平数据的探索、标准化和总结。生物统计学4(2):249 - 264;Irizarry,R.A.,Gautier,L.,Cope,L.(2003c)。用于分析Affymetrix寡核苷酸阵列的R包。载于:Parmigiani,R.I.G.,Garrett,E.S.,Ziegler,S.编。基因表达数据分析:方法与软件。柏林:施普林格出版社,第102 - 119页]。
在考虑将这些方法应用于来自正常和肿瘤组织配对样本的前列腺癌数据集时,结果表明,即使在进行多重检验校正后,标准化方法可能会导致配对t显著性检验所鉴定的基因数量大幅膨胀。这主要是由于标准化对实验误差方差产生的意外影响所致。与MAS 5.0算法相比,RMA方法的影响似乎更大;与中位数-四分位距标准化相比,分位数标准化的影响似乎更大。