关于非平衡微阵列数据分析的评论。

Comments on the analysis of unbalanced microarray data.

机构信息

Department of Biostatistics, Box 357232, University of Washington, Seattle, WA 98195, USA.

出版信息

Bioinformatics. 2009 Aug 15;25(16):2035-41. doi: 10.1093/bioinformatics/btp363. Epub 2009 Jun 15.

DOI:10.1093/bioinformatics/btp363

PMID:19528084

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2732368/

Abstract

MOTIVATION

Permutation testing is very popular for analyzing microarray data to identify differentially expressed (DE) genes; estimating false discovery rates (FDRs) is a very popular way to address the inherent multiple testing problem. However, combining these approaches may be problematic when sample sizes are unequal.

RESULTS

With unbalanced data, permutation tests may not be suitable because they do not test the hypothesis of interest. In addition, permutation tests can be biased. Using biased P-values to estimate the FDR can produce unacceptable bias in those estimates. Results also show that the approach of pooling permutation null distributions across genes can produce invalid P-values, since even non-DE genes can have different permutation null distributions. We encourage researchers to use statistics that have been shown to reliably discriminate DE genes, but caution that associated P-values may be either invalid, or a less-effective metric for discriminating DE genes.

摘要

动机

置换检验非常流行，可用于分析微阵列数据以识别差异表达（DE）基因；估计错误发现率（FDR）是解决固有多重检验问题的一种非常流行的方法。然而，当样本量不相等时，组合这些方法可能会有问题。

结果

对于不平衡数据，置换检验可能不合适，因为它们不检验感兴趣的假设。此外，置换检验可能有偏差。使用有偏差的 P 值来估计 FDR 可能会导致这些估计值出现不可接受的偏差。结果还表明，跨基因汇总置换零分布的方法可能会产生无效的 P 值，因为即使是非 DE 基因也可能具有不同的置换零分布。我们鼓励研究人员使用已被证明能够可靠地区分 DE 基因的统计方法，但要注意相关的 P 值可能是无效的，或者是区分 DE 基因的效果较差的指标。