分析人类群体样本时，常用差异表达方法会导致假阳性结果夸大。

Exaggerated false positives by popular differential expression methods when analyzing human population samples.

机构信息

Division of Computational Biomedicine, Department of Biological Chemistry, School of Medicine, University of California, Irvine, Irvine, CA, 92697, USA.

Department of Statistics, University of California, Los Angeles, CA, 90095, USA.

出版信息

Genome Biol. 2022 Mar 15;23(1):79. doi: 10.1186/s13059-022-02648-4.

DOI:10.1186/s13059-022-02648-4

PMID:35292087

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8922736/

Abstract

When identifying differentially expressed genes between two conditions using human population RNA-seq samples, we found a phenomenon by permutation analysis: two popular bioinformatics methods, DESeq2 and edgeR, have unexpectedly high false discovery rates. Expanding the analysis to limma-voom, NOISeq, dearseq, and Wilcoxon rank-sum test, we found that FDR control is often failed except for the Wilcoxon rank-sum test. Particularly, the actual FDRs of DESeq2 and edgeR sometimes exceed 20% when the target FDR is 5%. Based on these results, for population-level RNA-seq studies with large sample sizes, we recommend the Wilcoxon rank-sum test.

摘要

当使用人类群体 RNA-seq 样本识别两种条件之间差异表达的基因时，我们通过置换分析发现了一种现象：两种流行的生物信息学方法，DESeq2 和 edgeR，具有出乎意料的高假发现率。将分析扩展到 limma-voom、NOISeq、dearseq 和 Wilcoxon 秩和检验，我们发现除了 Wilcoxon 秩和检验外，FDR 控制通常会失败。特别是，当目标 FDR 为 5%时，DESeq2 和 edgeR 的实际 FDR 有时会超过 20%。基于这些结果，对于具有大样本量的群体水平 RNA-seq 研究，我们推荐使用 Wilcoxon 秩和检验。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2ed8/8922736/08fc7617fba9/13059_2022_2648_Fig1_HTML.jpg

相似文献

Exaggerated false positives by popular differential expression methods when analyzing human population samples.

Genome Biol. 2022 Mar 15;23(1):79. doi: 10.1186/s13059-022-02648-4.

Response to "Neglecting normalization impact in semi-synthetic RNA-seq data simulation generates artificial false positives" and "Winsorization greatly reduces false positives by popular differential expression methods when analyzing human population samples".

Genome Biol. 2024 Oct 30;25(1):283. doi: 10.1186/s13059-024-03232-8.

Winsorization greatly reduces false positives by popular differential expression methods when analyzing human population samples.

Genome Biol. 2024 Oct 30;25(1):282. doi: 10.1186/s13059-024-03230-w.

An evaluation of RNA-seq differential analysis methods.

PLoS One. 2022 Sep 16;17(9):e0264246. doi: 10.1371/journal.pone.0264246. eCollection 2022.

Neglecting the impact of normalization in semi-synthetic RNA-seq data simulations generates artificial false positives.

Genome Biol. 2024 Oct 30;25(1):281. doi: 10.1186/s13059-024-03231-9.

Benchmarking RNA-seq differential expression analysis methods using spike-in and simulation data.

PLoS One. 2020 Apr 30;15(4):e0232271. doi: 10.1371/journal.pone.0232271. eCollection 2020.

Robust identification of differentially expressed genes from RNA-seq data.

Genomics. 2020 Mar;112(2):2000-2010. doi: 10.1016/j.ygeno.2019.11.012. Epub 2019 Nov 20.

Choice of library size normalization and statistical methods for differential gene expression analysis in balanced two-group comparisons for RNA-seq studies.

BMC Genomics. 2020 Jan 28;21(1):75. doi: 10.1186/s12864-020-6502-7.

Novel Data Transformations for RNA-seq Differential Expression Analysis.

Sci Rep. 2019 Mar 18;9(1):4820. doi: 10.1038/s41598-019-41315-w.

Robustness of differential gene expression analysis of RNA-seq.

Comput Struct Biotechnol J. 2021 May 26;19:3470-3481. doi: 10.1016/j.csbj.2021.05.040. eCollection 2021.

引用本文的文献

EasyMultiProfiler: an efficient multi-omics data integration and analysis workflow for microbiome research.

Sci China Life Sci. 2025 Sep 8. doi: 10.1007/s11427-025-3035-0.

Faecal microbiota differences between an autochthonous pig breed and a commercial line.

Sci Rep. 2025 Sep 1;15(1):32176. doi: 10.1038/s41598-025-13460-y.

Pathway Analysis Interpretation in the Multi-Omic Era.

BioTech (Basel). 2025 Jul 29;14(3):58. doi: 10.3390/biotech14030058.

Explicit Scale Simulation for analysis of RNA-sequencing count data with ALDEx2.

NAR Genom Bioinform. 2025 Aug 19;7(3):lqaf108. doi: 10.1093/nargab/lqaf108. eCollection 2025 Sep.

Biomaterial-mediated Cell Atlas: an insight from single-cell and spatial transcriptomics.

Bioact Mater. 2025 Aug 8;54:1-33. doi: 10.1016/j.bioactmat.2025.07.047. eCollection 2025 Dec.

The Relevance of G-Quadruplexes in Gene Promoters and the First Introns Associated with Transcriptional Regulation in Breast Cancer.

Int J Mol Sci. 2025 Jul 17;26(14):6874. doi: 10.3390/ijms26146874.

J Inflamm Res. 2025 Jul 19;18:9587-9608. doi: 10.2147/JIR.S519566. eCollection 2025.

An Investigation of TDA1 Deficiency in Saccharomyces cerevisiae During Diauxic Growth.

Yeast. 2025 Jun;42(5-7):142-156. doi: 10.1002/yea.4004. Epub 2025 Jun 26.

Super-enhancer-mediated circRNAs exhibit high splicing circularization diversity and transcriptional activity.

Nucleic Acids Res. 2025 Jun 6;53(11). doi: 10.1093/nar/gkaf505.

DiSC: a statistical tool for fast differential expression analysis of individual-level single-cell RNA-seq data.

Bioinformatics. 2025 Jun 2;41(6). doi: 10.1093/bioinformatics/btaf327.

本文引用的文献

Clipper: p-value-free FDR control on high-throughput data from two conditions.

Genome Biol. 2021 Oct 11;22(1):288. doi: 10.1186/s13059-021-02506-9.

Inflated false discovery rate due to volcano plots: problem and solutions.

Brief Bioinform. 2021 Sep 2;22(5). doi: 10.1093/bib/bbab053.

dearseq: a variance component score test for RNA-seq differential analysis that effectively controls the false discovery rate.

NAR Genom Bioinform. 2020 Nov 19;2(4):lqaa093. doi: 10.1093/nargab/lqaa093. eCollection 2020 Dec.

Systematic comparison and assessment of RNA-seq procedures for gene expression quantitative analysis.

Sci Rep. 2020 Nov 12;10(1):19737. doi: 10.1038/s41598-020-76881-x.

The GTEx Consortium atlas of genetic regulatory effects across human tissues.

Science. 2020 Sep 11;369(6509):1318-1330. doi: 10.1126/science.aaz1776.

Benchmarking RNA-seq differential expression analysis methods using spike-in and simulation data.

PLoS One. 2020 Apr 30;15(4):e0232271. doi: 10.1371/journal.pone.0232271. eCollection 2020.

Sequence count data are poorly fit by the negative binomial distribution.

PLoS One. 2020 Apr 30;15(4):e0224909. doi: 10.1371/journal.pone.0224909. eCollection 2020.

Choice of library size normalization and statistical methods for differential gene expression analysis in balanced two-group comparisons for RNA-seq studies.

BMC Genomics. 2020 Jan 28;21(1):75. doi: 10.1186/s12864-020-6502-7.

RNA sequencing: the teenage years.

Nat Rev Genet. 2019 Nov;20(11):631-656. doi: 10.1038/s41576-019-0150-2. Epub 2019 Jul 24.

Distinct Immune Cell Populations Define Response to Anti-PD-1 Monotherapy and Anti-PD-1/Anti-CTLA-4 Combined Therapy.

Cancer Cell. 2019 Feb 11;35(2):238-255.e6. doi: 10.1016/j.ccell.2019.01.003.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

分析人类群体样本时，常用差异表达方法会导致假阳性结果夸大。

Exaggerated false positives by popular differential expression methods when analyzing human population samples.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献