Armstrong Nicola J, van de Wiel Mark A
Department of Mathematics, Vrije Universiteit, Amsterdam, The Netherlands.
Cell Oncol. 2004;26(5-6):279-90. doi: 10.1155/2004/943940.
We review several commonly used methods for the design and analysis of microarray data. To begin with, some experimental design issues are addressed. Several approaches for pre-processing the data (filtering and normalization) before the statistical analysis stage are then discussed. A common first step in this type of analysis is gene selection based on statistical testing. Two approaches, permutation and model-based methods are explained and we emphasize the need to correct for multiple testing. Moreover, powerful approaches based on gene sets are mentioned. Clustering of either genes or samples is frequently performed when analyzing microarray data. We summarize the basics of both supervised and unsupervised clustering (classification). The latter may be of use for creating diagnostic arrays, for example. Construction of biological networks, such as pathways, is a statistically challenging but complex task that is a relatively new development and hence mentioned only briefly. We finish with some remarks on literature and software. The emphasis in this paper is on the philosophy behind several statistical issues and on a critical interpretation of microarray related analysis methods.
我们回顾了几种常用于微阵列数据设计与分析的方法。首先,讨论了一些实验设计问题。接着,探讨了在统计分析阶段之前对数据进行预处理(过滤和归一化)的几种方法。这类分析中常见的第一步是基于统计检验进行基因选择。解释了两种方法,即置换法和基于模型的方法,并强调了多重检验校正的必要性。此外,还提到了基于基因集的强大方法。在分析微阵列数据时,经常会对基因或样本进行聚类。我们总结了监督聚类和无监督聚类(分类)的基本要点。例如,后者可用于创建诊断阵列。构建生物网络,如通路,是一项具有统计学挑战性但又复杂的任务,这是一个相对较新的发展领域,因此仅作简要提及。最后,对文献和软件作了一些评论。本文重点在于几个统计问题背后的理念以及对与微阵列相关分析方法的批判性解读。