MRC Biostatistics Unit, University of Cambridge, Cambridge, UK.
Department of Medicine, Addenbrookes Hospital, University of Cambridge, Cambridge, UK.
Biom J. 2021 Jun;63(5):1096-1130. doi: 10.1002/bimj.201900254. Epub 2021 Mar 7.
High-dimensional hypothesis testing is ubiquitous in the biomedical sciences, and informative covariates may be employed to improve power. The conditional false discovery rate (cFDR) is a widely used approach suited to the setting where the covariate is a set of p-values for the equivalent hypotheses for a second trait. Although related to the Benjamini-Hochberg procedure, it does not permit any easy control of type-1 error rate and existing methods are over-conservative. We propose a new method for type-1 error rate control based on identifying mappings from the unit square to the unit interval defined by the estimated cFDR and splitting observations so that each map is independent of the observations it is used to test. We also propose an adjustment to the existing cFDR estimator which further improves power. We show by simulation that the new method more than doubles potential improvement in power over unconditional analyses compared to existing methods. We demonstrate our method on transcriptome-wide association studies and show that the method can be used in an iterative way, enabling the use of multiple covariates successively. Our methods substantially improve the power and applicability of cFDR analysis.
高维假设检验在生物医学科学中无处不在,而信息丰富的协变量可用于提高功效。条件错误发现率(cFDR)是一种广泛使用的方法,适用于协变量为第二个特征等效假设的 p 值集的情况。虽然与 Benjamini-Hochberg 程序有关,但它不允许轻易控制第一类错误率,并且现有方法过于保守。我们提出了一种基于识别从单位正方形到由估计的 cFDR 定义的单位区间的映射的新方法,并对观测值进行分割,以使每个映射独立于其用于测试的观测值。我们还对现有的 cFDR 估计器进行了调整,进一步提高了功效。我们通过模拟表明,与现有方法相比,新方法可将无条件分析的功效提高一倍以上。我们在转录组关联研究中展示了我们的方法,并表明该方法可以迭代使用,从而可以连续使用多个协变量。我们的方法大大提高了 cFDR 分析的功效和适用性。