Suppr超能文献

柯西组合检验:一种在任意相依结构下具有解析值计算功能的强大检验。

Cauchy combination test: a powerful test with analytic -value calculation under arbitrary dependency structures.

作者信息

Liu Yaowu, Xie Jun

机构信息

Department of Biostatistics, Harvard School of Public Health.

Department of Statistics, Purdue University.

出版信息

J Am Stat Assoc. 2020;115(529):393-402. doi: 10.1080/01621459.2018.1554485. Epub 2019 Apr 25.

Abstract

Combining individual -values to aggregate multiple small effects has a long-standing interest in statistics, dating back to the classic Fisher's combination test. In modern large-scale data analysis, correlation and sparsity are common features and efficient computation is a necessary requirement for dealing with massive data. To overcome these challenges, we propose a new test that takes advantage of the Cauchy distribution. Our test statistic has a simple form and is defined as a weighted sum of Cauchy transformation of individual -values. We prove a non-asymptotic result that the tail of the null distribution of our proposed test statistic can be well approximated by a Cauchy distribution under arbitrary dependency structures. Based on this theoretical result, the -value calculation of our proposed test is not only accurate, but also as simple as the classic -test or -test, making our test well suited for analyzing massive data. We further show that the power of the proposed test is asymptotically optimal in a strong sparsity setting. Extensive simulations demonstrate that the proposed test has both strong power against sparse alternatives and a good accuracy with respect to -value calculations, especially for very small -values. The proposed test has also been applied to a genome-wide association study of Crohn's disease and compared with several existing tests.

摘要

将个体值组合起来以汇总多个小效应在统计学中一直备受关注,可追溯到经典的费舍尔组合检验。在现代大规模数据分析中,相关性和稀疏性是常见特征,高效计算是处理海量数据的必要条件。为克服这些挑战,我们提出一种利用柯西分布的新检验方法。我们的检验统计量具有简单形式,被定义为个体值的柯西变换的加权和。我们证明了一个非渐近结果,即在任意依赖结构下,我们提出的检验统计量的零分布尾部可以很好地用柯西分布近似。基于这一理论结果,我们提出的检验的p值计算不仅准确,而且与经典的t检验或z检验一样简单,这使得我们的检验非常适合分析海量数据。我们进一步表明,在强稀疏性设置下,所提出检验的功效是渐近最优的。大量模拟表明,所提出的检验对于稀疏备择假设具有强大功效,并且在p值计算方面具有良好的准确性,特别是对于非常小的p值。所提出的检验还已应用于克罗恩病的全基因组关联研究,并与几种现有检验进行了比较。

相似文献

1
Cauchy combination test: a powerful test with analytic -value calculation under arbitrary dependency structures.
J Am Stat Assoc. 2020;115(529):393-402. doi: 10.1080/01621459.2018.1554485. Epub 2019 Apr 25.
2
Accurate and Efficient -value Calculation via Gaussian Approximation: a Novel Monte-Carlo Method.
J Am Stat Assoc. 2019;114(525):384-392. doi: 10.1080/01621459.2017.1407776. Epub 2018 Jun 28.
3
Robust tests for combining p-values under arbitrary dependency structures.
Sci Rep. 2022 Feb 24;12(1):3158. doi: 10.1038/s41598-022-07094-7.
4
Analytic P-value calculation for the higher criticism test in finite problems.
Biometrika. 2014;101(4):964-970. doi: 10.1093/biomet/asu033.
5
The generalized Fisher's combination and accurate p-value calculation under dependence.
Biometrics. 2023 Jun;79(2):1159-1172. doi: 10.1111/biom.13634. Epub 2022 Mar 9.
7
Global and Simultaneous Hypothesis Testing for High-Dimensional Logistic Regression Models.
J Am Stat Assoc. 2021;116(534):984-998. doi: 10.1080/01621459.2019.1699421. Epub 2020 Jan 21.
8
Power Enhancement in High Dimensional Cross-Sectional Tests.
Econometrica. 2015 Jul 1;83(4):1497-1541. doi: 10.3982/ECTA12749.
9
A Weighted Rank-Sum Procedure for Comparing Samples with Multiple Endpoints.
Stat Interface. 2009 Jan 1;2(2):197-201. doi: 10.4310/sii.2009.v2.n2.a9.
10
Testing generalized linear models with high-dimensional nuisance parameter.
Biometrika. 2023 Mar;110(1):83-99. doi: 10.1093/biomet/asac021. Epub 2022 Apr 5.

引用本文的文献

5
Genetic Modulation of Lifespan: Dynamic Effects, Sex Differences, and Body Weight Trade-offs.
bioRxiv. 2025 Jul 21:2025.04.27.649857. doi: 10.1101/2025.04.27.649857.
6
8
Multi-omics Integrative Analysis for Incomplete Data Using Weighted -Value Adjustment Approaches.
J Agric Biol Environ Stat. 2025;30(3):601-617. doi: 10.1007/s13253-024-00603-3. Epub 2024 Feb 28.
9
Regenie.QRS: computationally efficient whole-genome quantile regression at biobank scale.
bioRxiv. 2025 May 7:2025.05.02.651730. doi: 10.1101/2025.05.02.651730.

本文引用的文献

1
Accurate and Efficient -value Calculation via Gaussian Approximation: a Novel Monte-Carlo Method.
J Am Stat Assoc. 2019;114(525):384-392. doi: 10.1080/01621459.2017.1407776. Epub 2018 Jun 28.
2
Estimation of the false discovery proportion with unknown dependence.
J R Stat Soc Series B Stat Methodol. 2017 Sep;79(4):1143-1164. doi: 10.1111/rssb.12204. Epub 2016 Sep 26.
3
The Generalized Higher Criticism for Testing SNP-Set Effects in Genetic Association Studies.
J Am Stat Assoc. 2017;112(517):64-76. doi: 10.1080/01621459.2016.1192039. Epub 2017 May 3.
4
Partitioning heritability by functional annotation using genome-wide association summary statistics.
Nat Genet. 2015 Nov;47(11):1228-35. doi: 10.1038/ng.3404. Epub 2015 Sep 28.
5
Exact meta-analysis approach for discrete data and its application to 2 × 2 tables with rare events.
J Am Stat Assoc. 2014 Oct;109(508):1450-1465. doi: 10.1080/01621459.2014.946318.
6
JEPEG: a summary statistics based tool for gene-level joint testing of functional variants.
Bioinformatics. 2015 Apr 15;31(8):1176-82. doi: 10.1093/bioinformatics/btu816. Epub 2014 Dec 12.
7
Rare-variant association analysis: study designs and statistical tests.
Am J Hum Genet. 2014 Jul 3;95(1):5-23. doi: 10.1016/j.ajhg.2014.06.009.
8
Estimating False Discovery Proportion Under Arbitrary Covariance Dependence.
J Am Stat Assoc. 2012;107(499):1019-1035. doi: 10.1080/01621459.2012.720478.
10
Rare-variant association testing for sequencing data with the sequence kernel association test.
Am J Hum Genet. 2011 Jul 15;89(1):82-93. doi: 10.1016/j.ajhg.2011.05.029. Epub 2011 Jul 7.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验