Pan Wei, Lin Jizhen, Le Chap T
Division of Biostatistics, School of Public Health, University of Minnesota, A460 Mayo, MMC 303, 420 Delaware Street SE, Minneapolis, MN 55455-0378, USA.
Funct Integr Genomics. 2003 Jul;3(3):117-24. doi: 10.1007/s10142-003-0085-7. Epub 2003 Jul 1.
An exciting biological advancement over the past few years is the use of microarray technologies to measure simultaneously the expression levels of thousands of genes. The bottleneck now is how to extract useful information from the resulting large amounts of data. An important and common task in analyzing microarray data is to identify genes with altered expression under two experimental conditions. We propose a nonparametric statistical approach, called the mixture model method (MMM), to handle the problem when there are a small number of replicates under each experimental condition. Specifically, we propose estimating the distributions of a t -type test statistic and its null statistic using finite normal mixture models. A comparison of these two distributions by means of a likelihood ratio test, or simply using the tail distribution of the null statistic, can identify genes with significantly changed expression. Several methods are proposed to effectively control the false positives. The methodology is applied to a data set containing expression levels of 1,176 genes of rats with and without pneumococcal middle ear infection.
过去几年中一项令人兴奋的生物学进展是利用微阵列技术同时测量数千个基因的表达水平。现在的瓶颈在于如何从产生的大量数据中提取有用信息。分析微阵列数据时一项重要且常见的任务是识别在两种实验条件下表达发生改变的基因。我们提出一种非参数统计方法,称为混合模型方法(MMM),以处理在每种实验条件下只有少量重复样本的问题。具体而言,我们建议使用有限正态混合模型估计t型检验统计量及其零假设统计量的分布。通过似然比检验比较这两种分布,或者简单地使用零假设统计量的尾部分布,可以识别出表达有显著变化的基因。我们提出了几种方法来有效控制假阳性。该方法应用于一个包含有和没有肺炎球菌性中耳感染的大鼠的1176个基因表达水平的数据集。