McShane Lisa M, Shih Joanna H, Michalowska Aleksandra M
Biometric Research Branch, National Cancer Institute, Bethesda, Maryland 20892, USA.
J Mammary Gland Biol Neoplasia. 2003 Jul;8(3):359-74. doi: 10.1023/b:jomg.0000010035.57912.5a.
Appropriate statistical design and analysis of gene expression microarray studies is critical in order to draw valid and useful conclusions from expression profiling studies of animal models. In this paper, several aspects of study design are discussed, including the number of animals that need to be studied to ensure sufficiently powered studies, usefulness of replication and pooling, and allocation of samples to arrays. Data preprocessing methods for both cDNA dual-label spotted arrays and Affymetrix-style oligonucleotide arrays are reviewed. High-level analysis strategies are briefly discussed for each of the types of study aims, namely class comparison, class discovery, and class prediction. For class comparison, methods are discussed for identifying genes differentially expressed between classes while guarding against unacceptably high numbers of false positive findings. Various clustering methods are discussed for class discovery aims. Class prediction methods are briefly reviewed, and reference is made to the importance of proper validation of predictors.
为了从动物模型的基因表达谱研究中得出有效且有用的结论,基因表达微阵列研究的适当统计设计和分析至关重要。本文讨论了研究设计的几个方面,包括为确保有足够效力的研究而需要研究的动物数量、重复和合并的有用性以及样本在阵列上的分配。综述了cDNA双标记点阵阵列和Affymetrix式寡核苷酸阵列的数据预处理方法。针对每种研究目的类型,即类别比较、类别发现和类别预测,简要讨论了高级分析策略。对于类别比较,讨论了在防止出现不可接受的大量假阳性结果的同时识别不同类别之间差异表达基因的方法。针对类别发现目的讨论了各种聚类方法。简要回顾了类别预测方法,并提及了对预测器进行适当验证的重要性。