Tsai Chen-An, Hsueh Huey-Miin, Chen James J
Division of Biometry and Risk Assessment, National Center for Toxicological Research, Food and Drug Administration, Jefferson, Arkansas 72079, USA.
J Biopharm Stat. 2004 Aug;14(3):553-73. doi: 10.1081/BIP-200025648.
Microarray technology allows the measurement of expression levels of a large number of genes simultaneously. There are inherent biases in microarray data generated from an experiment. Various statistical methods have been proposed for data normalization and data analysis. This paper proposes a generalized additive model for the analysis of gene expression data. This model consists of two sub-models: a non-linear model and a linear model. We propose a two-step normalization algorithm to fit the two sub-models sequentially. The first step involves a non-parametric regression using lowess fits to adjust for non-linear systematic biases. The second step uses a linear ANOVA model to estimate the remaining effects including the interaction effect of genes and treatments, the effect of interest in a study. The proposed model is a generalization of the ANOVA model for microarray data analysis. We show correspondences between the lowess fit and the ANOVA model methods. The normalization procedure does not assume the majority of genes do not change their expression levels, and neither does it assume two channel intensities from the same spot are independent. The procedure can be applied to either one channel or two channel data from the experiments with multiple treatments or multiple nuisance factors. Two toxicogenomic experiment data sets and a simulated data set are used to contrast the proposed method with the commonly known lowess fit and ANOVA methods.
微阵列技术允许同时测量大量基因的表达水平。从实验中产生的微阵列数据存在固有偏差。已经提出了各种统计方法用于数据归一化和数据分析。本文提出了一种用于基因表达数据分析的广义相加模型。该模型由两个子模型组成:一个非线性模型和一个线性模型。我们提出了一种两步归一化算法,用于依次拟合这两个子模型。第一步涉及使用局部加权散点平滑法(lowess)拟合进行非参数回归,以调整非线性系统偏差。第二步使用线性方差分析模型来估计包括基因与处理的交互效应、研究中感兴趣的效应在内的其余效应。所提出的模型是用于微阵列数据分析的方差分析模型的推广。我们展示了局部加权散点平滑法拟合与方差分析模型方法之间的对应关系。归一化过程不假定大多数基因的表达水平不变,也不假定来自同一点的两个通道强度是独立的。该过程可应用于来自具有多种处理或多种干扰因素的实验的单通道或双通道数据。使用两个毒理基因组学实验数据集和一个模拟数据集,将所提出的方法与常用的局部加权散点平滑法拟合和方差分析方法进行对比。