Yu Haiyuan, Nguyen Katherine, Royce Tom, Qian Jiang, Nelson Kenneth, Snyder Michael, Gerstein Mark
Department of Molecular Biophysics and Biochemistry, Cellular and Developmental Biology, Yale University, CT 06520, USA.
Nucleic Acids Res. 2007;35(2):e8. doi: 10.1093/nar/gkl871. Epub 2006 Dec 7.
Microarray technology is currently one of the most widely-used technologies in biology. Many studies focus on inferring the function of an unknown gene from its co-expressed genes. Here, we are able to show that there are two types of positional artifacts in microarray data introducing spurious correlations between genes. First, we find that genes that are close on the microarray chips tend to have higher correlations between their expression profiles. We call this the 'chip artifact'. Our calculations suggest that the carry-over during the printing process is one of the major sources of this type of artifact, which is later confirmed by our experiments. Based on our experiments, the measured intensity of a microarray spot contains 0.1% (for fully-hybridized spots) to 93% (for un-hybridized ones) of noise resulting from this artifact. Secondly, we, for the first time, show that genes that are close on the microtiter plates in microarray experiments also tend to have higher correlations. We call this the 'plate artifact'. Both types of artifacts exist with different severity in all cDNA microarray experiments that we analyzed. Therefore, we develop an automated web tool-COP (COrrelations by Positional artifacts) to detect these artifacts in microarray experiments. COP has been integrated with the microarray data normalization tool, ExpressYourself, which is available at http://bioinfo.mbb.yale.edu/ExpressYourself/. Together, the two can eliminate most of the common noises in microarray data.
微阵列技术是目前生物学领域应用最广泛的技术之一。许多研究致力于从共表达基因推断未知基因的功能。在此,我们能够表明微阵列数据中存在两种位置假象,它们会在基因之间引入虚假相关性。首先,我们发现微阵列芯片上位置相近的基因,其表达谱之间往往具有更高的相关性。我们将此称为“芯片假象”。我们的计算表明,打印过程中的残留是这类假象的主要来源之一,后续实验证实了这一点。基于我们的实验,微阵列斑点的测量强度中,由此类假象产生的噪声占比为0.1%(对于完全杂交的斑点)至93%(对于未杂交的斑点)。其次,我们首次表明,在微阵列实验中,微量滴定板上位置相近的基因也往往具有更高的相关性。我们将此称为“板假象”。在我们分析的所有cDNA微阵列实验中,这两种假象都以不同程度存在。因此,我们开发了一个自动化网络工具——COP(通过位置假象进行相关性分析),用于在微阵列实验中检测这些假象。COP已与微阵列数据归一化工具ExpressYourself集成,该工具可在http://bioinfo.mbb.yale.edu/ExpressYourself/获取。二者结合,能够消除微阵列数据中大部分常见噪声。