Suppr超能文献

在全基因组研究中使用条件对称多维高斯混合模型检验大量复合零假设

Testing a Large Number of Composite Null Hypotheses Using Conditionally Symmetric Multidimensional Gaussian Mixtures in Genome-Wide Studies.

作者信息

Sun Ryan, McCaw Zachary R, Lin Xihong

机构信息

Department of Biostatistics at MD Anderson Cancer Center.

Senior Machine Learning Scientist at Insitro.

出版信息

J Am Stat Assoc. 2025;120(550):605-617. doi: 10.1080/01621459.2024.2422124. Epub 2024 Dec 5.

Abstract

Causal mediation, pleiotropy, and replication analyses are three highly popular genetic study designs. Although these analyses address different scientific questions, the underlying statistical inference problems all involve large-scale testing of composite null hypotheses. The goal is to determine whether all null hypotheses - as opposed to at least one - in a set of individual tests should simultaneously be rejected. Recently, various methods have been proposed for each of these situations, including an appealing two-group empirical Bayes approach that calculates local false discovery rates (lfdr). However, lfdr estimation is difficult due to the need for multivariate density estimation. Furthermore, the multiple testing rules for the empirical Bayes lfdr approach can disagree with conventional frequentist z-statistics, which is troubling for a field that ubiquitously utilizes summary statistics. This work proposes a framework to unify two-group testing in genetic association composite null settings, the conditionally symmetric multidimensional Gaussian mixture model (csmGmm). The csmGmm is shown to demonstrate more robust operating characteristics than recently-proposed alternatives. Crucially, the csmGmm also offers interpretability guarantees by harmonizing lfdr and z-statistic testing rules. We extend the base csmGmm to cover each of the mediation, pleiotropy, and replication settings, and we prove that the lfdr z-statistic agreement holds in each situation. We apply the model to a collection of translational lung cancer genetic association studies that motivated this work.

摘要

因果中介分析、多效性分析和重复分析是三种非常流行的基因研究设计。尽管这些分析解决的是不同的科学问题,但潜在的统计推断问题都涉及对复合零假设的大规模检验。目标是确定在一组单独的检验中,所有零假设(而不是至少一个)是否应同时被拒绝。最近,针对这些情况中的每一种都提出了各种方法,包括一种有吸引力的两组经验贝叶斯方法,该方法计算局部错误发现率(lfdr)。然而,由于需要进行多元密度估计,lfdr估计很困难。此外,经验贝叶斯lfdr方法的多重检验规则可能与传统的频率主义z统计量不一致,这对于一个普遍使用汇总统计量的领域来说是个麻烦。这项工作提出了一个框架,以统一基因关联复合零假设设置中的两组检验,即条件对称多维高斯混合模型(csmGmm)。结果表明,csmGmm比最近提出的替代方法具有更稳健的操作特性。至关重要的是,csmGmm还通过协调lfdr和z统计量检验规则提供了可解释性保证。我们扩展了基本的csmGmm以涵盖中介分析、多效性分析和重复分析的每种设置,并证明在每种情况下lfdr与z统计量的一致性都成立。我们将该模型应用于一系列推动这项工作的转化肺癌基因关联研究。

相似文献

9
Dressings and topical agents for treating pressure ulcers.用于治疗压疮的敷料和外用剂。
Cochrane Database Syst Rev. 2017 Jun 22;6(6):CD011947. doi: 10.1002/14651858.CD011947.pub2.

引用本文的文献

1
Large-scale composite hypothesis testing procedure for omics data analyses.用于组学数据分析的大规模复合假设检验程序
NAR Genom Bioinform. 2025 Sep 5;7(3):lqaf118. doi: 10.1093/nargab/lqaf118. eCollection 2025 Sep.

本文引用的文献

2
A multiple-testing procedure for high-dimensional mediation hypotheses.一种用于高维中介假设的多重检验程序。
J Am Stat Assoc. 2022;117(537):198-213. doi: 10.1080/01621459.2020.1765785. Epub 2020 Jun 24.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验