Sásik R, Calvo E, Corbeil J
School of Medicine, University of California San Diego, La Jolla, CA 92093-0679, USA.
Bioinformatics. 2002 Dec;18(12):1633-40. doi: 10.1093/bioinformatics/18.12.1633.
High-density oligonucleotide arrays (GeneChip, Affymetrix, Santa Clara, CA) have become a standard research tool in many areas of biomedical research. They quantitatively monitor the expression of thousands of genes simultaneously by measuring fluorescence from gene-specific targets or probes. The relationship between signal intensities and transcript abundance as well as normalization issues have been the focus of much recent attention (Hill et al., 2001; Chudin et al., 2002; Naef et al., 2002a). It is desirable that a researcher has the best possible analytical tools to make the most of the information that this powerful technology has to offer. At present there are three analytical methods available: the newly released Affymetrix Microarray Suite 5.0 (AMS) software that accompanies the GeneChip product, the method of Li and Wong (LW; Li and Wong, 2001), and the method of Naef et al. (FN; Naef et al., 2001). The AMS method is tailored for analysis of a single microarray, and can therefore be used with any experimental design. The LW method on the other hand depends on a large number of microarrays in an experiment and cannot be used for an isolated microarray, and the FN method is particular to paired microarrays, such as resulting from an experiment in which each 'treatment' sample has a corresponding 'control' sample. Our focus is on analysis of experiments in which there is a series of samples. In this case only the AMS, LW, and the method described in this paper can be used. The present method is model-based, like the LW method, but assumes multiplicative not additive noise, and employs elimination of statistically significant outliers for improved results. Unlike LW and AMS, we do not assume probe-specific background (measured by the so-called mismatch probes). Rather, we assume uniform background, whose level is estimated using both the mismatch and perfect match probe intensities.
We present a new method for GeneChip analysis, based on a statistical model with multiplicative noise. We demonstrated that this method yields results superior to those obtained by the Affymetrix Microarray Suite 5.0 software and to those obtained by the model-based method of Li and Wong (Li and Wong, 2001). The present method eliminates the hard-to-interpret negative expression indices, and the binary 'presence' calls (present or absent) are replaced by the statistical significance (p-value) of gene expression. We have found that thresholding the p-values at the (0.1)(16)-level produces about the same number of 'present' calls as the AMS software. By testing our method on a pair of replicate GeneChips (hybridized with the same cRNA), we found that 95.6% of data points lie within the 1.25-fold interval. In other words, our method had a 4.4% type I error rate at the 1.25-fold level. The error rate of the LW method was 15%, and that of the AMS method was 29%. There were no points outside the 2-fold interval with the present method. Analysis of variance (ANOVA) of another experiment with multiple replicates shows that this reduction of variance is not accompanied by a corresponding reduction of signal. On the contrary, the signal-to-noise ratio (as measured by the distribution of F-statistics) of the present method is on average 3.4-times better than that of AMS, and 1.4-times better than that of Li and Wong.
高密度寡核苷酸阵列(基因芯片,Affymetrix公司,加利福尼亚州圣克拉拉)已成为生物医学研究许多领域的标准研究工具。它们通过测量来自基因特异性靶标或探针的荧光,同时定量监测数千个基因的表达。信号强度与转录本丰度之间的关系以及标准化问题一直是近期备受关注的焦点(希尔等人,2001年;楚丁等人,2002年;内夫等人,2002a)。研究人员期望拥有尽可能好的分析工具,以充分利用这项强大技术所提供的信息。目前有三种分析方法可用:随基因芯片产品一同新发布的Affymetrix微阵列套件5.0(AMS)软件、李和王的方法(LW;李和王,2001年)以及内夫等人的方法(FN;内夫等人,2001年)。AMS方法专为单个微阵列的分析量身定制,因此可用于任何实验设计。另一方面,LW方法依赖于实验中的大量微阵列,不能用于单个孤立的微阵列,而FN方法特定于配对微阵列,例如来自每个“处理”样本都有相应“对照”样本的实验。我们关注的是对有一系列样本的实验进行分析。在这种情况下,只能使用AMS、LW以及本文所述的方法。本文所述方法与LW方法一样基于模型,但假设噪声是乘性而非加性的,并采用消除具有统计学显著性的异常值来改进结果。与LW和AMS不同,我们不假设探针特异性背景(由所谓的错配探针测量)。相反,我们假设背景是均匀的,其水平使用错配探针和完全匹配探针的强度来估计。
我们提出了一种基于乘性噪声统计模型的基因芯片分析新方法。我们证明,该方法产生的结果优于Affymetrix微阵列套件5.0软件获得的结果,也优于李和王基于模型的方法(李和王,2001年)。本文所述方法消除了难以解释的负表达指数,并且将二元“存在”判定(存在或不存在)替换为基因表达的统计学显著性(p值)。我们发现,将p值阈值设定在(0.1)(16)水平时,产生的“存在”判定数量与AMS软件大致相同。通过在一对重复基因芯片(与相同的cRNA杂交)上测试我们的方法,我们发现95.6%的数据点落在1.25倍区间内。换句话说,我们的方法在1.25倍水平时的I型错误率为4.4%。LW方法的错误率为15%,AMS方法的错误率为