Kendziorski C, Irizarry R A, Chen K-S, Haag J D, Gould M N
Department of Biostatistics and Medical Informatics and McArdle Laboratory for Cancer Research, University of Wisconsin, Madison, WI 53703, USA.
Proc Natl Acad Sci U S A. 2005 Mar 22;102(12):4252-7. doi: 10.1073/pnas.0500607102. Epub 2005 Mar 8.
Over 15% of the data sets catalogued in the Gene Expression Omnibus Database involve RNA samples that have been pooled before hybridization. Pooling affects data quality and inference, but the exact effects are not yet known because pooling has not been systematically studied in the context of microarray experiments. Here we report on the results of an experiment designed to evaluate the utility of pooling and the impact on identifying differentially expressed genes. We find that inference for most genes is not adversely affected by pooling, and we recommend that pooling be done when fewer than three arrays are used in each condition. For larger designs, pooling does not significantly improve inferences if few subjects are pooled. The realized benefits in this case do not outweigh the price paid for loss of individual specific information. Pooling is beneficial when many subjects are pooled, provided that independent samples contribute to multiple pools.
基因表达综合数据库中编目的超过15%的数据集涉及在杂交前已混合的RNA样本。混合会影响数据质量和推断,但由于在微阵列实验的背景下尚未对混合进行系统研究,确切影响尚不清楚。在此,我们报告一项旨在评估混合的效用及其对鉴定差异表达基因的影响的实验结果。我们发现,对于大多数基因而言,混合不会对推断产生不利影响,并且我们建议在每种条件下使用少于三个阵列时进行混合。对于更大的设计,如果混合的样本很少,混合不会显著改善推断。在这种情况下,实际的好处并不超过因丢失个体特定信息而付出的代价。当混合许多样本时,混合是有益的,前提是独立样本对多个混合样本有贡献。