Liu Yaowu, Xie Jun
Department of Biostatistics, Harvard School of Public Health.
Department of Statistics, Purdue University.
J Am Stat Assoc. 2020;115(529):393-402. doi: 10.1080/01621459.2018.1554485. Epub 2019 Apr 25.
Combining individual -values to aggregate multiple small effects has a long-standing interest in statistics, dating back to the classic Fisher's combination test. In modern large-scale data analysis, correlation and sparsity are common features and efficient computation is a necessary requirement for dealing with massive data. To overcome these challenges, we propose a new test that takes advantage of the Cauchy distribution. Our test statistic has a simple form and is defined as a weighted sum of Cauchy transformation of individual -values. We prove a non-asymptotic result that the tail of the null distribution of our proposed test statistic can be well approximated by a Cauchy distribution under arbitrary dependency structures. Based on this theoretical result, the -value calculation of our proposed test is not only accurate, but also as simple as the classic -test or -test, making our test well suited for analyzing massive data. We further show that the power of the proposed test is asymptotically optimal in a strong sparsity setting. Extensive simulations demonstrate that the proposed test has both strong power against sparse alternatives and a good accuracy with respect to -value calculations, especially for very small -values. The proposed test has also been applied to a genome-wide association study of Crohn's disease and compared with several existing tests.
将个体值组合起来以汇总多个小效应在统计学中一直备受关注,可追溯到经典的费舍尔组合检验。在现代大规模数据分析中,相关性和稀疏性是常见特征,高效计算是处理海量数据的必要条件。为克服这些挑战,我们提出一种利用柯西分布的新检验方法。我们的检验统计量具有简单形式,被定义为个体值的柯西变换的加权和。我们证明了一个非渐近结果,即在任意依赖结构下,我们提出的检验统计量的零分布尾部可以很好地用柯西分布近似。基于这一理论结果,我们提出的检验的p值计算不仅准确,而且与经典的t检验或z检验一样简单,这使得我们的检验非常适合分析海量数据。我们进一步表明,在强稀疏性设置下,所提出检验的功效是渐近最优的。大量模拟表明,所提出的检验对于稀疏备择假设具有强大功效,并且在p值计算方面具有良好的准确性,特别是对于非常小的p值。所提出的检验还已应用于克罗恩病的全基因组关联研究,并与几种现有检验进行了比较。