Kendziorski C M, Zhang Y, Lan H, Attie A D
Department of Biostatistics and Medical Informatics, University of Wisconsin, 6729 Medical Sciences Center, 1300 University Avenue, Madison, WI 53792, USA.
Biostatistics. 2003 Jul;4(3):465-77. doi: 10.1093/biostatistics/4.3.465.
In a microarray experiment, messenger RNA samples are oftentimes pooled across subjects out of necessity, or in an effort to reduce the effect of biological variation. A basic problem in such experiments is to estimate the nominal expression levels of a large number of genes. Pooling samples will affect expression estimation, but the exact effects are not yet known as the approach has not been systematically studied in this context. We consider how mRNA pooling affects expression estimates by assessing the finite-sample performance of different estimators for designs with and without pooling. Conditions under which it is advantageous to pool mRNA are defined; and general properties of estimates from both pooled and non-pooled designs are derived under these conditions. A formula is given for the total number of subjects and arrays required in a pooled experiment to obtain gene expression estimates and confidence intervals comparable to those obtained from the no-pooling case. The formula demonstrates that by pooling a perhaps increased number of subjects, one can decrease the number of arrays required in an experiment without a loss of precision. The assumptions that facilitate derivation of this formula are considered using data from a quantitative real-time PCR experiment. The calculations are not specific to one particular method of quantifying gene expression as they assume only that a single, normalized, estimate of expression is obtained for each gene. As such, the results should be generally applicable to a number of technologies provided sufficient pre-processing and normalization methods are available and applied.
在微阵列实验中,由于必要或为了减少生物变异的影响,信使核糖核酸(mRNA)样本常常会跨个体进行合并。此类实验中的一个基本问题是估计大量基因的名义表达水平。合并样本会影响表达估计,但由于在此背景下尚未对该方法进行系统研究,确切影响尚不清楚。我们通过评估有无合并设计的不同估计器的有限样本性能,来考虑mRNA合并如何影响表达估计。定义了合并mRNA有利的条件;并在这些条件下推导了合并和非合并设计估计的一般性质。给出了在合并实验中获得与无合并情况相当的基因表达估计和置信区间所需的受试者和阵列总数的公式。该公式表明,通过合并可能增加数量的受试者,可以减少实验中所需的阵列数量而不损失精度。使用定量实时聚合酶链反应(PCR)实验的数据来考虑有助于推导此公式的假设。这些计算并非特定于一种特定的基因表达定量方法,因为它们仅假设每个基因获得单个归一化的表达估计。因此,只要有足够的预处理和归一化方法可用并应用,结果应普遍适用于多种技术。