Department of Biostatistics, University of Nebraska Medical Center, Omaha, Nebraska, USA.
Biometrics. 2023 Dec;79(4):3497-3509. doi: 10.1111/biom.13848. Epub 2023 Mar 15.
False discovery rate (FDR) controlling procedures provide important statistical guarantees for replicability in signal identification based on multiple hypotheses testing. In many fields of study, FDR controling procedures are used in high-dimensional (HD) analyses to discover features that are truly associated with the outcome. In some recent applications, data on the same set of candidate features are independently collected in multiple different studies. For example, gene expression data are collected at different facilities and with different cohorts, to identify the genetic biomarkers of multiple types of cancers. These studies provide us with opportunities to identify signals by considering information from different sources (with potential heterogeneity) jointly. This paper is about how to provide FDR control guarantees for the tests of union null hypotheses of conditional independence. We present a knockoff-based variable selection method (Simultaneous knockoffs) to identify mutual signals from multiple independent datasets, providing exact FDR control guarantees under finite sample settings. This method can work with very general model settings and test statistics. We demonstrate the performance of this method with extensive numerical studies and two real-data examples.
错误发现率(FDR)控制程序为基于多重假设检验的信号识别中的可重复性提供了重要的统计保证。在许多研究领域中,FDR 控制程序用于高维(HD)分析中,以发现与结果真正相关的特征。在最近的一些应用中,同一组候选特征的数据在多个不同的研究中独立收集。例如,基因表达数据在不同的设施和不同的队列中收集,以确定多种癌症的遗传生物标志物。这些研究为我们提供了通过联合考虑来自不同来源(具有潜在异质性)的信息来识别信号的机会。本文介绍了如何为条件独立性的联合零假设检验提供 FDR 控制保证。我们提出了一种基于 knockoff 的变量选择方法(Simultaneous knockoffs),用于从多个独立数据集识别相互信号,在有限样本设置下提供确切的 FDR 控制保证。该方法适用于非常一般的模型设置和检验统计量。我们通过广泛的数值研究和两个实际数据示例展示了该方法的性能。