Biostatistics and Research Decision Sciences, Merck Research Laboratories, Rahway, New Jersey, USA.
Department of Mathematical Sciences, Worcester Polytechnic Institute, Worcester, Massachusetts, USA.
Biometrics. 2023 Jun;79(2):1159-1172. doi: 10.1111/biom.13634. Epub 2022 Mar 9.
Combining dependent tests of significance has broad applications but the related p-value calculation is challenging. For Fisher's combination test, current p-value calculation methods (eg, Brown's approximation) tend to inflate the type I error rate when the desired significance level is substantially less than 0.05. The problem could lead to significant false discoveries in big data analyses. This paper provides two main contributions. First, it presents a general family of Fisher type statistics, referred to as the GFisher, which covers many classic statistics, such as Fisher's combination, Good's statistic, Lancaster's statistic, weighted Z-score combination, and so forth. The GFisher allows a flexible weighting scheme, as well as an omnibus procedure that automatically adapts proper weights and the statistic-defining parameters to a given data. Second, the paper presents several new p-value calculation methods based on two novel ideas: moment-ratio matching and joint-distribution surrogating. Systematic simulations show that the new calculation methods are more accurate under multivariate Gaussian, and more robust under the generalized linear model and the multivariate t-distribution. The applications of the GFisher and the new p-value calculation methods are demonstrated by a gene-based single nucleotide polymorphism (SNP)-set association study. Relevant computation has been implemented to an R package GFisher available on the Comprehensive R Archive Network.
合并依赖的显著性检验具有广泛的应用,但相关的 p 值计算具有挑战性。对于 Fisher 的合并检验,当期望的显著性水平远小于 0.05 时,当前的 p 值计算方法(例如,Brown 的近似)往往会导致Ⅰ类错误率膨胀。这个问题可能导致大数据分析中的重大错误发现。本文主要有两个贡献。首先,它提出了一个一般的 Fisher 型统计量族,称为 GFisher,它涵盖了许多经典的统计量,如 Fisher 的合并、Good 的统计量、Lancaster 的统计量、加权 Z 得分合并等。GFisher 允许灵活的加权方案,以及一个综合程序,它可以自动适应适当的权重和定义参数给给定的数据。其次,本文提出了几种新的 p 值计算方法,基于两个新的想法:矩比匹配和联合分布替代。系统的模拟表明,新的计算方法在多元高斯分布下更准确,在广义线性模型和多元 t 分布下更稳健。GFisher 和新的 p 值计算方法的应用通过一个基于基因的单核苷酸多态性(SNP)集合关联研究来展示。相关的计算已经实现到一个可在 Comprehensive R Archive Network 上获得的 R 包 GFI sher 中。