Division of Computational Biomedicine, Department of Biological Chemistry, School of Medicine, University of California, Irvine, Irvine, CA, 92697, USA.
Department of Statistics, University of California, Los Angeles, CA, 90095, USA.
Genome Biol. 2022 Mar 15;23(1):79. doi: 10.1186/s13059-022-02648-4.
When identifying differentially expressed genes between two conditions using human population RNA-seq samples, we found a phenomenon by permutation analysis: two popular bioinformatics methods, DESeq2 and edgeR, have unexpectedly high false discovery rates. Expanding the analysis to limma-voom, NOISeq, dearseq, and Wilcoxon rank-sum test, we found that FDR control is often failed except for the Wilcoxon rank-sum test. Particularly, the actual FDRs of DESeq2 and edgeR sometimes exceed 20% when the target FDR is 5%. Based on these results, for population-level RNA-seq studies with large sample sizes, we recommend the Wilcoxon rank-sum test.
当使用人类群体 RNA-seq 样本识别两种条件之间差异表达的基因时,我们通过置换分析发现了一种现象:两种流行的生物信息学方法,DESeq2 和 edgeR,具有出乎意料的高假发现率。将分析扩展到 limma-voom、NOISeq、dearseq 和 Wilcoxon 秩和检验,我们发现除了 Wilcoxon 秩和检验外,FDR 控制通常会失败。特别是,当目标 FDR 为 5%时,DESeq2 和 edgeR 的实际 FDR 有时会超过 20%。基于这些结果,对于具有大样本量的群体水平 RNA-seq 研究,我们推荐使用 Wilcoxon 秩和检验。