Suppr超能文献

分析人类群体样本时,常用差异表达方法会导致假阳性结果夸大。

Exaggerated false positives by popular differential expression methods when analyzing human population samples.

机构信息

Division of Computational Biomedicine, Department of Biological Chemistry, School of Medicine, University of California, Irvine, Irvine, CA, 92697, USA.

Department of Statistics, University of California, Los Angeles, CA, 90095, USA.

出版信息

Genome Biol. 2022 Mar 15;23(1):79. doi: 10.1186/s13059-022-02648-4.

Abstract

When identifying differentially expressed genes between two conditions using human population RNA-seq samples, we found a phenomenon by permutation analysis: two popular bioinformatics methods, DESeq2 and edgeR, have unexpectedly high false discovery rates. Expanding the analysis to limma-voom, NOISeq, dearseq, and Wilcoxon rank-sum test, we found that FDR control is often failed except for the Wilcoxon rank-sum test. Particularly, the actual FDRs of DESeq2 and edgeR sometimes exceed 20% when the target FDR is 5%. Based on these results, for population-level RNA-seq studies with large sample sizes, we recommend the Wilcoxon rank-sum test.

摘要

当使用人类群体 RNA-seq 样本识别两种条件之间差异表达的基因时,我们通过置换分析发现了一种现象:两种流行的生物信息学方法,DESeq2 和 edgeR,具有出乎意料的高假发现率。将分析扩展到 limma-voom、NOISeq、dearseq 和 Wilcoxon 秩和检验,我们发现除了 Wilcoxon 秩和检验外,FDR 控制通常会失败。特别是,当目标 FDR 为 5%时,DESeq2 和 edgeR 的实际 FDR 有时会超过 20%。基于这些结果,对于具有大样本量的群体水平 RNA-seq 研究,我们推荐使用 Wilcoxon 秩和检验。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2ed8/8922736/08fc7617fba9/13059_2022_2648_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验