控制程序和虚假发现率的估计及其在低维环境中的应用：实证研究。

Control procedures and estimators of the false discovery rate and their application in low-dimensional settings: an empirical investigation.

机构信息

Institute of Medical Biometry and Informatics, University of Heidelberg, Im Neuenheimer Feld 130.3, 69120, Heidelberg, Germany.

Institute for Medical Biometry and Statistics, Faculty of Medicine and Medical Center, University of Freiburg, Stefan-Meier-Str. 26, 79104, Freiburg, Germany.

出版信息

BMC Bioinformatics. 2018 Mar 2;19(1):78. doi: 10.1186/s12859-018-2081-x.

DOI:10.1186/s12859-018-2081-x

PMID:29499647

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5833079/

Abstract

BACKGROUND

When many (up to millions) of statistical tests are conducted in discovery set analyses such as genome-wide association studies (GWAS), approaches controlling family-wise error rate (FWER) or false discovery rate (FDR) are required to reduce the number of false positive decisions. Some methods were specifically developed in the context of high-dimensional settings and partially rely on the estimation of the proportion of true null hypotheses. However, these approaches are also applied in low-dimensional settings such as replication set analyses that might be restricted to a small number of specific hypotheses. The aim of this study was to compare different approaches in low-dimensional settings using (a) real data from the CKDGen Consortium and (b) a simulation study.

RESULTS

In both application and simulation FWER approaches were less powerful compared to FDR control methods, whether a larger number of hypotheses were tested or not. Most powerful was the q-value method. However, the specificity of this method to maintain true null hypotheses was especially decreased when the number of tested hypotheses was small. In this low-dimensional situation, estimation of the proportion of true null hypotheses was biased.

CONCLUSIONS

The results highlight the importance of a sizeable data set for a reliable estimation of the proportion of true null hypotheses. Consequently, methods relying on this estimation should only be applied in high-dimensional settings. Furthermore, if the focus lies on testing of a small number of hypotheses such as in replication settings, FWER methods rather than FDR methods should be preferred to maintain high specificity.

摘要

背景

当在发现集分析（如全基因组关联研究[GWAS]）中进行多达数百万次的统计检验时，需要采用控制总体错误率（FWER）或假发现率（FDR）的方法来减少假阳性决策的数量。一些方法是专门在高维环境中开发的，并部分依赖于对真实零假设比例的估计。然而，这些方法也应用于低维环境，如复制集分析，这些分析可能仅限于少数特定假设。本研究的目的是使用（a）CKDGen 联盟的真实数据和（b）模拟研究，在低维环境中比较不同方法。

结果

无论是测试更多还是更少的假设，在应用和模拟中，FWER 方法都不如 FDR 控制方法有效。最有效的方法是 q 值方法。然而，当测试的假设数量较少时，该方法对维持真实零假设的特异性尤其降低。在这种低维情况下，对真实零假设比例的估计存在偏差。

结论

结果强调了可靠估计真实零假设比例需要大量数据集的重要性。因此，仅应在高维环境中应用依赖于这种估计的方法。此外，如果重点是测试少量假设，如复制设置，则应优先选择 FWER 方法而不是 FDR 方法，以保持高特异性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/608c/5833079/82d2aa861a1e/12859_2018_2081_Fig1_HTML.jpg

相似文献

Control procedures and estimators of the false discovery rate and their application in low-dimensional settings: an empirical investigation.控制程序和虚假发现率的估计及其在低维环境中的应用：实证研究。

BMC Bioinformatics. 2018 Mar 2;19(1):78. doi: 10.1186/s12859-018-2081-x.

Re-sampling strategy to improve the estimation of number of null hypotheses in FDR control under strong correlation structures.在强相关结构下改进错误发现率（FDR）控制中零假设数量估计的重采样策略。

BMC Bioinformatics. 2007 May 18;8:157. doi: 10.1186/1471-2105-8-157.

A powerful statistical framework for generalization testing in GWAS, with application to the HCHS/SOL.一种用于全基因组关联研究中泛化测试的强大统计框架，并应用于西班牙裔社区健康研究/拉丁裔研究（HCHS/SOL）。

Genet Epidemiol. 2017 Apr;41(3):251-258. doi: 10.1002/gepi.22029. Epub 2017 Jan 15.

Simple estimators of false discovery rates given as few as one or two p-values without strong parametric assumptions.在几乎没有强参数假设的情况下，仅根据一两个p值就能得出的简单错误发现率估计值。

Stat Appl Genet Mol Biol. 2013 Aug;12(4):529-43. doi: 10.1515/sagmb-2013-0003.

Improving power of genome-wide association studies with weighted false discovery rate control and prioritized subset analysis.利用加权假发现率控制和优先子集分析提高全基因组关联研究的效能。

PLoS One. 2012;7(4):e33716. doi: 10.1371/journal.pone.0033716. Epub 2012 Apr 9.

Stratified false discovery control for large-scale hypothesis testing with application to genome-wide association studies.用于大规模假设检验的分层错误发现控制及其在全基因组关联研究中的应用。

Genet Epidemiol. 2006 Sep;30(6):519-30. doi: 10.1002/gepi.20164.

Multiple testing with discrete data: Proportion of true null hypotheses and two adaptive FDR procedures.离散数据的多重检验：真零假设的比例及两种自适应错误发现率程序

Biom J. 2018 Jul;60(4):761-779. doi: 10.1002/bimj.201700157. Epub 2018 May 11.

False discovery rate estimation for stability selection: application to genome-wide association studies.稳定性选择中的错误发现率估计：在全基因组关联研究中的应用。

Stat Appl Genet Mol Biol. 2011 Nov 28;10(1):/j/sagmb.2011.10.issue-1/1544-6115.1663/1544-6115.1663.xml. doi: 10.2202/1544-6115.1663.

Rank-invariant resampling based estimation of false discovery rate for analysis of small sample microarray data.基于秩不变重采样的小样本微阵列数据分析中错误发现率估计

BMC Bioinformatics. 2005 Jul 22;6:187. doi: 10.1186/1471-2105-6-187.

Resampling-based empirical Bayes multiple testing procedures for controlling generalized tail probability and expected value error rates: focus on the false discovery rate and simulation study.基于重采样的经验贝叶斯多重检验程序，用于控制广义尾概率和期望值错误率：聚焦于错误发现率及模拟研究

Biom J. 2008 Oct;50(5):716-44. doi: 10.1002/bimj.200710473.

引用本文的文献

Design aspects for prognostic factor studies.预后因素研究的设计方面。

BMJ Open. 2025 Aug 31;15(8):e095065. doi: 10.1136/bmjopen-2024-095065.

Strong correlation of gene counts and differentially expressed genes between a 3' RNA-Seq and an RNA hybridization platform in transcriptome analyses from canine archival tissues.在犬类存档组织的转录组分析中，3' RNA测序与RNA杂交平台之间基因计数和差异表达基因的强相关性。

Front Vet Sci. 2025 Jun 30;12:1601306. doi: 10.3389/fvets.2025.1601306. eCollection 2025.

The Significant Effects of Threshold Selection for Advancing Nitrogen Use Efficiency in Whole Genome of Bread Wheat.阈值选择对提高面包小麦全基因组氮素利用效率的显著影响

Plant Direct. 2025 Jan 21;9(1):e70036. doi: 10.1002/pld3.70036. eCollection 2025 Jan.

Genome-Wide Association Studies Revealed Several Candidate Genes of Meat Productivity in Saryarka Fat-Tailed Coarse-Wool Sheep Breed.全基因组关联研究揭示了萨亚尔卡肥尾粗毛绵羊品种肉用性能的几个候选基因。

Genes (Basel). 2024 Nov 29;15(12):1549. doi: 10.3390/genes15121549.

Mode of injury and level of synovitis alter inflammatory chondrocyte gene expression and associated pathways.损伤模式和滑膜炎程度改变炎症性软骨细胞的基因表达和相关途径。

Sci Rep. 2024 Nov 21;14(1):28917. doi: 10.1038/s41598-024-71964-5.

Epigenetic Reprogramming Potentiates ICAM1 Antibody Drug Conjugates in Preclinical Models of Melanoma.表观遗传重编程增强了用于黑色素瘤临床前模型的 ICAM1 抗体药物偶联物。

Adv Sci (Weinh). 2024 Aug;11(30):e2400203. doi: 10.1002/advs.202400203. Epub 2024 Jun 14.

Broken Rotor Bar Detection Based on Steady-State Stray Flux Signals Using Triaxial Sensor with Random Positioning.基于使用随机定位的三轴传感器的稳态杂散磁通信号的断条检测

Sensors (Basel). 2024 May 12;24(10):3080. doi: 10.3390/s24103080.

Longitudinal study investigating the influence of COMT gene polymorphism on cortical thickness changes in Parkinson's disease over four years.一项长达四年的纵向研究，旨在探讨 COMT 基因多态性对帕金森病患者皮质厚度变化的影响。

Sci Rep. 2024 Apr 30;14(1):9920. doi: 10.1038/s41598-024-60828-7.

Estimating gene-level false discovery probability improves eQTL statistical fine-mapping precision.估计基因水平的错误发现概率可提高表达定量性状位点（eQTL）统计精细定位的精度。

NAR Genom Bioinform. 2023 Oct 30;5(4):lqad090. doi: 10.1093/nargab/lqad090. eCollection 2023 Dec.

Costs and Benefits of Popular -Value Correction Methods in Three Models of Quantitative Omic Experiments.在三种定量组学实验模型中，流行值校正方法的成本和收益。

Anal Chem. 2023 Feb 7;95(5):2732-2740. doi: 10.1021/acs.analchem.2c03719. Epub 2023 Jan 24.

本文引用的文献

Metabolomic Alterations Associated with Cause of CKD.与慢性肾脏病病因相关的代谢组学改变。

Clin J Am Soc Nephrol. 2017 Nov 7;12(11):1787-1794. doi: 10.2215/CJN.02560317. Epub 2017 Sep 28.

A statistical method for the conservative adjustment of false discovery rate (q-value).一种用于错误发现率（q值）保守调整的统计方法。

BMC Bioinformatics. 2017 Mar 14;18(Suppl 3):69. doi: 10.1186/s12859-017-1474-6.

False discovery rates: a new deal.错误发现率：一项新举措。

Biostatistics. 2017 Apr 1;18(2):275-294. doi: 10.1093/biostatistics/kxw041.

Genetic associations at 53 loci highlight cell types and biological pathways relevant for kidney function.53个基因座的遗传关联揭示了与肾功能相关的细胞类型和生物学途径。

Nat Commun. 2016 Jan 21;7:10023. doi: 10.1038/ncomms10023.

A global reference for human genetic variation.人类遗传变异的全球参考。

Nature. 2015 Oct 1;526(7571):68-74. doi: 10.1038/nature15393.

Statistical analysis for genome-wide association study.全基因组关联研究的统计分析。

J Biomed Res. 2015 Jul;29(4):285-97. doi: 10.7555/JBR.29.20140007. Epub 2014 Nov 30.

Multiple hypothesis testing in genomics.基因组学中的多重假设检验。

Stat Med. 2014 May 20;33(11):1946-78. doi: 10.1002/sim.6082. Epub 2014 Jan 8.

A unified approach to false discovery rate estimation.一种统一的错误发现率估计方法。

BMC Bioinformatics. 2008 Jul 9;9:303. doi: 10.1186/1471-2105-9-303.

Estimation of the multiple testing burden for genomewide association studies of nearly all common variants.几乎所有常见变异的全基因组关联研究的多重检验负担估计。

Genet Epidemiol. 2008 May;32(4):381-5. doi: 10.1002/gepi.20303.

PLINK: a tool set for whole-genome association and population-based linkage analyses.PLINK：一个用于全基因组关联分析和基于群体的连锁分析的工具集。

Am J Hum Genet. 2007 Sep;81(3):559-75. doi: 10.1086/519795. Epub 2007 Jul 25.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

控制程序和虚假发现率的估计及其在低维环境中的应用：实证研究。

Control procedures and estimators of the false discovery rate and their application in low-dimensional settings: an empirical investigation.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献