在具有强内部相关性的数据集中，要警惕虚假发现的反直觉水平。

Beware of counter-intuitive levels of false discoveries in datasets with strong intra-correlations.

作者信息

Kanduri Chakravarthi, Mamica Maria, Olstad Emilie Willoch, Zucknick Manuela, Li Jingyi Jessica, Sandve Geir Kjetil

机构信息

Scientific Computing and Machine Learning Section, Department of Informatics, University of Oslo, Oslo, Norway.

UiORealArt Convergence Environment, University of Oslo, Oslo, Norway.

出版信息

Genome Biol. 2025 Aug 18;26(1):249. doi: 10.1186/s13059-025-03734-z.

DOI:10.1186/s13059-025-03734-z

PMID:40826107

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12359981/

Abstract

The false discovery rate (FDR) controlling method by Benjamini and Hochberg (BH) is a popular choice in the omics fields. Here, we demonstrate that in datasets with a large degree of dependencies between features, FDR correction methods like BH can sometimes counter-intuitively report very high numbers of false positives, potentially misleading researchers. We call the attention of researchers to use suited multiple testing strategies and approaches like synthetic null data (negative control) to identify and minimize caveats related to false discoveries, as in the cases where false findings do occur, they may be numerous.

摘要

本雅明尼和霍奇伯格（BH）提出的错误发现率（FDR）控制方法是组学领域中常用的选择。在此，我们证明，在特征之间存在高度依赖性的数据集中，像BH这样的FDR校正方法有时可能会违反直觉地报告大量假阳性结果，这可能会误导研究人员。我们提醒研究人员使用合适的多重检验策略和方法，如合成零数据（阴性对照），以识别并尽量减少与错误发现相关的问题，因为在确实出现错误结果的情况下，可能会有很多。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1232/12359981/8283f5ca4a91/13059_2025_3734_Fig1_HTML.jpg

相似文献

Beware of counter-intuitive levels of false discoveries in datasets with strong intra-correlations.

Genome Biol. 2025 Aug 18;26(1):249. doi: 10.1186/s13059-025-03734-z.

Prescription of Controlled Substances: Benefits and Risks

Sexual Harassment and Prevention Training

Interventions to reduce harm from continued tobacco use.

Cochrane Database Syst Rev. 2016 Oct 13;10(10):CD005231. doi: 10.1002/14651858.CD005231.pub3.

PET-CT for assessing mediastinal lymph node involvement in patients with suspected resectable non-small cell lung cancer.

Cochrane Database Syst Rev. 2014 Nov 13;2014(11):CD009519. doi: 10.1002/14651858.CD009519.pub2.

Low-complexity manual nucleic acid amplification tests for pulmonary tuberculosis in children.

Cochrane Database Syst Rev. 2025 Jun 25;6(6):CD015806. doi: 10.1002/14651858.CD015806.pub2.

Rapid, point-of-care antigen tests for diagnosis of SARS-CoV-2 infection.

Cochrane Database Syst Rev. 2022 Jul 22;7(7):CD013705. doi: 10.1002/14651858.CD013705.pub3.

Xpert MTB/RIF assay for extrapulmonary tuberculosis and rifampicin resistance.

Cochrane Database Syst Rev. 2018 Aug 27;8(8):CD012768. doi: 10.1002/14651858.CD012768.pub2.

Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.

Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.

"I Don't Understand Their Sense of Belonging": Exploring How Nonbinary Autistic Adults Experience Gender.

Autism Adulthood. 2024 Dec 2;6(4):462-473. doi: 10.1089/aut.2023.0071. eCollection 2024 Dec.

本文引用的文献

On the analysis of metabolite quantitative trait loci: Impact of different data transformations and study designs.

Sci Adv. 2025 Apr 11;11(15):eadp4532. doi: 10.1126/sciadv.adp4532.

Whole blood transcriptome in long-COVID patients reveals association with lung function and immune response.

J Allergy Clin Immunol. 2024 Sep;154(3):807-818. doi: 10.1016/j.jaci.2024.04.032. Epub 2024 Jun 1.

scDesign3 generates realistic in silico data for multimodal single-cell and spatial omics.

Nat Biotechnol. 2024 Feb;42(2):247-252. doi: 10.1038/s41587-023-01772-1. Epub 2023 May 11.

Genetic impacts on DNA methylation: research findings and future perspectives.

Genome Biol. 2021 Apr 30;22(1):127. doi: 10.1186/s13059-021-02347-6.

Comprehensive integrative profiling of upper tract urothelial carcinomas.

Genome Biol. 2021 Jan 4;22(1):7. doi: 10.1186/s13059-020-02230-w.

Multi-resolution localization of causal variants across the genome.

Nat Commun. 2020 Feb 27;11(1):1093. doi: 10.1038/s41467-020-14791-2.

A practical guide to methods controlling false discoveries in computational biology.

Genome Biol. 2019 Jun 4;20(1):118. doi: 10.1186/s13059-019-1716-1.

Gene hunting with hidden Markov model knockoffs.

Biometrika. 2019 Mar;106(1):1-18. doi: 10.1093/biomet/asy033. Epub 2018 Aug 4.

Power, false discovery rate and Winner's Curse in eQTL studies.

Nucleic Acids Res. 2018 Dec 14;46(22):e133. doi: 10.1093/nar/gky780.

TreeQTL: hierarchical error control for eQTL findings.

Bioinformatics. 2016 Aug 15;32(16):2556-8. doi: 10.1093/bioinformatics/btw198. Epub 2016 Apr 19.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

Suppr
超能文献

在具有强内部相关性的数据集中，要警惕虚假发现的反直觉水平。

Beware of counter-intuitive levels of false discoveries in datasets with strong intra-correlations.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

Suppr超能文献

在具有强内部相关性的数据集中，要警惕虚假发现的反直觉水平。

Beware of counter-intuitive levels of false discoveries in datasets with strong intra-correlations.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

Suppr
超能文献