Shih Joanna H, Michalowska Aleksandra M, Dobbin Kevin, Ye Yumei, Qiu Ting Hu, Green Jeffrey E
Biometric Research Branch, Division of Cancer Treatment and Diagnosis, National Cancer Institute, Bethesda, MD 20892, USA.
Bioinformatics. 2004 Dec 12;20(18):3318-25. doi: 10.1093/bioinformatics/bth391. Epub 2004 Jul 9.
In microarray experiments investigators sometimes wish to pool RNA samples before labeling and hybridization due to insufficient RNA from each individual sample or to reduce the number of arrays for the purpose of saving cost. The basic assumption of pooling is that the expression of an mRNA molecule in the pool is close to the average expression from individual samples. Recently, a method for studying the effect of pooling mRNA on statistical power in detecting differentially expressed genes between classes has been proposed, but the different sources of variation arising in microarray experiments were not distinguished. Another paper recently did take different sources of variation into account, but did not address power and sample size for class comparison. In this paper, we study the implication of pooling in detecting differential gene expression taking into account different sources of variation and check the basic assumption of pooling using data from both the cDNA and Affymetrix GeneChip microarray experiments.
We present formulas for the required number of subjects and arrays to achieve a desired power at a specified significance level. We show that due to the loss of degrees of freedom for a pooled design, a large increase in the number of subjects may be required to achieve a power comparable to that of a non-pooled design. The added expense of additional samples for the pooled design may outweigh the benefit of saving on microarray cost. The microarray data from both platforms show that the major assumption of pooling may not hold.
Supplementary material referenced in the text is available at http://linus.nci.nih.gov/brb/TechReport.htm.
在微阵列实验中,由于每个单独样本的RNA不足,或者为了节省成本而减少阵列数量,研究人员有时希望在标记和杂交之前将RNA样本混合。混合的基本假设是,混合样本中mRNA分子的表达接近各个样本的平均表达。最近,有人提出了一种研究混合mRNA对检测不同类别之间差异表达基因的统计功效影响的方法,但微阵列实验中出现的不同变异来源并未得到区分。最近的另一篇论文确实考虑了不同的变异来源,但没有涉及类别比较的功效和样本量问题。在本文中,我们研究了混合在检测差异基因表达中的意义,考虑了不同的变异来源,并使用来自cDNA和Affymetrix基因芯片微阵列实验的数据检验了混合的基本假设。
我们给出了在指定显著性水平下达到所需功效所需的受试者数量和阵列数量的公式。我们表明,由于混合设计自由度的损失,可能需要大幅增加受试者数量才能达到与非混合设计相当的功效。混合设计中额外样本的额外费用可能超过节省微阵列成本的好处。来自两个平台的微阵列数据表明,混合的主要假设可能不成立。