Brody James P, Williams Brian A, Wold Barbara J, Quake Stephen R
Departments of Applied Physics and Biology, California Institute of Technology, Pasadena, CA 91125, USA.
Proc Natl Acad Sci U S A. 2002 Oct 1;99(20):12975-8. doi: 10.1073/pnas.162468199. Epub 2002 Sep 16.
DNA microarrays are important devices for high throughput measurements of gene expression, but no rational foundation has been established for understanding the sources of within-chip statistical error. We designed a specialized chip and protocol to investigate the distribution and magnitude of within-chip errors and discovered that, as expected from theoretical expectations, measurement errors follow a Lorentzian-like distribution, which explains the widely observed but unexplained ill-reproducibility in microarray data. Using this specially designed chip, we examined a data set of repeated measurements to extract estimates of the distribution and magnitude of statistical errors in DNA microarray measurements. Using the common "ratio of medians" method, we find that the measurements follow a Lorentzian-like distribution, which is problematic for subsequent analysis. We show that a method of analysis dubbed "median of ratios" yields a more Gaussian-like distribution of errors. Finally, we show that the bootstrap algorithm can be used to extract the best estimates of the error in the measurement. Quantifying the statistical error in such measurements has important applications for estimating significance levels, clustering algorithms, and process optimization.
DNA微阵列是用于基因表达高通量测量的重要设备,但尚未建立起理解芯片内统计误差来源的合理基础。我们设计了一种专门的芯片和方案来研究芯片内误差的分布和大小,发现正如理论预期的那样,测量误差遵循类似洛伦兹分布,这解释了在微阵列数据中广泛观察到但未得到解释的不可重复性。使用这种专门设计的芯片,我们检查了一组重复测量数据集,以提取DNA微阵列测量中统计误差的分布和大小估计值。使用常见的“中位数比率”方法,我们发现测量值遵循类似洛伦兹分布,这对后续分析存在问题。我们表明,一种称为“比率中位数”的分析方法会产生更类似高斯分布的误差。最后,我们表明可以使用自助算法来提取测量误差的最佳估计值。量化此类测量中的统计误差在估计显著性水平、聚类算法和过程优化方面具有重要应用。