Xie Yang, Jeong Kyeong S, Pan Wei, Khodursky Arkady, Carlin Bradley P
Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455-0392, USA.
Comp Funct Genomics. 2004;5(5):432-44. doi: 10.1002/cfg.416.
DNA microarray analysis is a biological technology which permits the whole genome to be monitored simultaneously on a single slide. Microarray technology not only opens an exciting research area for biologists, but also provides significant new challenges to statisticians. Two very common questions in the analysis of microarray data are, first, should we normalize arrays to remove potential systematic biases, and if so, what normalization method should we use? Second, how should we then implement tests of statistical significance? Straightforward and uniform answers to these questions remain elusive. In this paper, we use a real data example to illustrate a practical approach to addressing these questions. Our data is taken from a DNA-protein binding microarray experiment aimed at furthering our understanding of transcription regulation mechanisms, one of the most important issues in biology. For the purpose of preprocessing data, we suggest looking at descriptive plots first to decide whether we need preliminary normalization and, if so, how this should be accomplished. For subsequent comparative inference, we recommend use of an empirical Bayes method (the B statistic), since it performs much better than traditional methods, such as the sample mean (M statistic) and Student's t statistic, and it is also relatively easy to compute and explain compared to the others. The false discovery rate (FDR) is used to evaluate the different methods, and our comparative results lend support to our above suggestions.
DNA微阵列分析是一种生物技术,它能使整个基因组在一张载玻片上同时得到监测。微阵列技术不仅为生物学家开辟了一个令人兴奋的研究领域,也给统计学家带来了重大的新挑战。在微阵列数据分析中,有两个非常常见的问题,首先,我们是否应该对阵列进行归一化以消除潜在的系统偏差,如果是,我们应该使用什么归一化方法?其次,我们应该如何进行统计显著性检验?这些问题目前还没有直接和统一的答案。在本文中,我们用一个实际数据例子来说明解决这些问题的实用方法。我们的数据来自一个DNA-蛋白质结合微阵列实验,该实验旨在加深我们对转录调控机制的理解,转录调控机制是生物学中最重要的问题之一。为了进行数据预处理,我们建议首先查看描述性图,以确定是否需要进行初步归一化,如果需要,应该如何完成。对于后续的比较推断,我们建议使用经验贝叶斯方法(B统计量),因为它的表现比传统方法(如样本均值(M统计量)和学生t统计量)要好得多,而且与其他方法相比,它相对容易计算和解释。错误发现率(FDR)用于评估不同的方法,我们的比较结果支持了我们上述建议。