Department of Mathematics and Statistics, Center for Integrated Biosystems, Utah State University, Logan, Utah, United States of America.
PLoS One. 2012;7(8):e39570. doi: 10.1371/journal.pone.0039570. Epub 2012 Aug 2.
Statistical methods to test for differential expression traditionally assume that each gene's expression summaries are independent across arrays. When certain preprocessing methods are used to obtain those summaries, this assumption is not necessarily true. In general, the erroneous assumption of dependence results in a loss of statistical power. We introduce a diagnostic measure of numerical dependence for gene expression summaries from any preprocessing method and discuss the relative performance of several common preprocessing methods with respect to this measure. Some common preprocessing methods introduce non-trivial levels of numerical dependence. The issue of (between-array) dependence has received little if any attention in the literature, and researchers working with gene expression data should not take such properties for granted, or they risk unnecessarily losing statistical power.
传统的用于检测差异表达的统计方法假设每个基因的表达摘要在各个数组之间是独立的。但是,当使用某些预处理方法来获取这些摘要时,这种假设并不一定成立。通常,错误地假设依赖性会导致统计功效的损失。我们为任何预处理方法的基因表达摘要引入了一种数值依赖性的诊断度量,并讨论了几种常见预处理方法在该度量方面的相对性能。一些常见的预处理方法会引入非平凡的数值依赖性。(数组间)依赖性的问题在文献中很少受到关注,如果研究人员在处理基因表达数据时不考虑这些性质,那么他们可能会不必要地损失统计功效。