Ghosal Sayan, Schatz Michael C, Venkataraman Archana
Chan Zuckerberg Initiative Foundation, 94065, CA, USA.
Computer Science, Johns Hopkins University, Baltimore, MD, 21218, USA.
bioRxiv. 2024 Sep 8:2023.03.24.534116. doi: 10.1101/2023.03.24.534116.
We introduce a novel framework BEATRICE to identify putative causal variants from GWAS statistics. Identifying causal variants is challenging due to their sparsity and high correlation in the nearby regions. To account for these challenges, we rely on a hierarchical Bayesian model that imposes a binary concrete prior on the set of causal variants. We derive a variational algorithm for this fine-mapping problem by minimizing the KL divergence between an approximate density and the posterior probability distribution of the causal configurations. Correspondingly, we use a deep neural network as an inference machine to estimate the parameters of our proposal distribution. Our stochastic optimization procedure allows us to simultaneously sample from the space of causal configurations. We use these samples to compute the posterior inclusion probabilities and determine credible sets for each causal variant. We conduct a detailed simulation study to quantify the performance of our framework against two state-of-the-art baseline methods across different numbers of causal variants and different noise paradigms, as defined by the relative genetic contributions of causal and non-causal variants. We demonstrate that BEATRICE achieves uniformly better coverage with comparable power and set sizes, and that the performance gain increases with the number of causal variants. We also show the efficacy BEATRICE in finding causal variants from the GWAS study of Alzheimer's disease. In comparison to the baselines, only BEATRICE can successfully find the APOE allele, a commonly associated variant of Alzheimer's. Thus, we show that BEATRICE is a valuable tool to identify causal variants from eQTL and GWAS summary statistics across complex diseases and traits.
我们引入了一种新颖的框架BEATRICE,用于从全基因组关联研究(GWAS)统计数据中识别潜在的因果变异。由于因果变异的稀疏性以及其在附近区域的高度相关性,识别因果变异具有挑战性。为应对这些挑战,我们依赖于一种层次贝叶斯模型,该模型对因果变异集施加二元具体先验。我们通过最小化近似密度与因果配置的后验概率分布之间的KL散度,推导出一种用于此精细定位问题的变分算法。相应地,我们使用深度神经网络作为推理机来估计我们提议分布的参数。我们的随机优化过程使我们能够同时从因果配置空间中进行采样。我们使用这些样本计算后验包含概率,并为每个因果变异确定可信集。我们进行了详细的模拟研究,以根据因果变异和非因果变异的相对遗传贡献所定义的不同数量因果变异和不同噪声范式,量化我们的框架相对于两种最先进的基线方法的性能。我们证明,BEATRICE在具有可比功效和集合大小的情况下实现了一致更好的覆盖范围,并且性能增益随着因果变异数量的增加而增加。我们还展示了BEATRICE在从阿尔茨海默病的GWAS研究中寻找因果变异方面的有效性。与基线相比,只有BEATRICE能够成功找到APOE等位基因,这是阿尔茨海默病的一种常见相关变异。因此,我们表明BEATRICE是从复杂疾病和性状的表达数量性状位点(eQTL)和GWAS汇总统计数据中识别因果变异的有价值工具。