Yang Hyuna, Harrington Christina A, Vartanian Kristina, Coldren Christopher D, Hall Rob, Churchill Gary A
The Jackson Laboratory, Bar Harbor, ME, USA.
PLoS One. 2008;3(11):e3724. doi: 10.1371/journal.pone.0003724. Epub 2008 Nov 14.
The quality of gene expression microarray data has improved dramatically since the first arrays were introduced in the late 1990s. However, the reproducibility of data generated at multiple laboratory sites remains a matter of concern, especially for scientists who are attempting to combine and analyze data from public repositories. We have carried out a study in which a common set of RNA samples was assayed five times in four different laboratories using Affymetrix GeneChip arrays. We observed dramatic differences in the results across laboratories and identified batch effects in array processing as one of the primary causes for these differences. When batch processing of samples is confounded with experimental factors of interest it is not possible to separate their effects, and lists of differentially expressed genes may include many artifacts. This study demonstrates the substantial impact of sample processing on microarray analysis results and underscores the need for randomization in the laboratory as a means to avoid confounding of biological factors with procedural effects.
自20世纪90年代末首次引入基因表达微阵列以来,微阵列数据的质量有了显著提高。然而,多个实验室产生的数据的可重复性仍然令人担忧,特别是对于那些试图合并和分析来自公共数据库的数据的科学家来说。我们进行了一项研究,在四个不同实验室中使用Affymetrix基因芯片阵列对一组常见的RNA样本进行了五次检测。我们观察到不同实验室的结果存在显著差异,并将阵列处理中的批次效应确定为这些差异的主要原因之一。当样本的批次处理与感兴趣的实验因素混淆时,就无法区分它们的影响,差异表达基因列表可能包含许多假象。这项研究证明了样本处理对微阵列分析结果的重大影响,并强调了在实验室中进行随机化的必要性,以此作为避免生物学因素与程序效应混淆的一种手段。