Tansey Wesley, Wang Yixin, Rabadan Raul, Blei David M
Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY, USA.
Department of Statistics, Columbia University, New York, NY, USA.
Int Stat Rev. 2020 Dec;88(Suppl 1):S91-S113. doi: 10.1111/insr.12430. Epub 2020 Nov 25.
Analyzing data from large-scale, multi-experiment studies requires scientists to both analyze each experiment and to assess the results as a whole. In this article, we develop double empirical Bayes testing (DEBT), an empirical Bayes method for analyzing multi-experiment studies when many covariates are gathered per experiment. DEBT is a two-stage method: in the first stage, it reports which experiments yielded significant outcomes; in the second stage, it hypothesizes which covariates drive the experimental significance. In both of its stages, DEBT builds on Efron (2008), which lays out an elegant empirical Bayes approach to testing. DEBT enhances this framework by learning a series of black box predictive models to boost power and control the false discovery rate (FDR). In Stage 1, it uses a deep neural network prior to report which experiments yielded significant outcomes. In Stage 2, it uses an empirical Bayes version of the knockoff filter (Candes et al., 2018) to select covariates that have significant predictive power of Stage-1 significance. In both simulated and real data, DEBT increases the proportion of discovered significant outcomes and selects more features when signals are weak. In a real study of cancer cell lines, DEBT selects a robust set of biologically-plausible genomic drivers of drug sensitivity and resistance in cancer.
分析来自大规模多实验研究的数据,要求科学家既要分析每个实验,又要整体评估结果。在本文中,我们开发了双重经验贝叶斯检验(DEBT),这是一种经验贝叶斯方法,用于在每个实验收集了许多协变量时分析多实验研究。DEBT是一种两阶段方法:在第一阶段,它报告哪些实验产生了显著结果;在第二阶段,它假设哪些协变量驱动了实验的显著性。在其两个阶段中,DEBT都基于Efron(2008),该文献提出了一种优雅的经验贝叶斯检验方法。DEBT通过学习一系列黑箱预测模型来提高功效并控制错误发现率(FDR),从而增强了这一框架。在第一阶段,它使用深度神经网络来报告哪些实验产生了显著结果。在第二阶段,它使用仿冒筛选器(Candes等人,2018)的经验贝叶斯版本来选择对第一阶段显著性具有显著预测能力的协变量。在模拟数据和真实数据中,当信号较弱时,DEBT都会增加发现的显著结果的比例并选择更多特征。在一项对癌细胞系的实际研究中,DEBT选择了一组可靠的、具有生物学合理性的癌症药物敏感性和耐药性的基因组驱动因素。