Computational Biology Branch, National Center for Biotechnology Information, National Library of Medicine, NIH, Bethesda, MD 20892.
Center for Precision Health Research, National Human Genome Research Institute, NIH, Bethesda, MD 20892.
Proc Natl Acad Sci U S A. 2023 Aug 29;120(35):e2206612120. doi: 10.1073/pnas.2206612120. Epub 2023 Aug 21.
Genetic association studies have identified hundreds of independent signals associated with type 2 diabetes (T2D) and related traits. Despite these successes, the identification of specific causal variants underlying a genetic association signal remains challenging. In this study, we describe a deep learning (DL) method to analyze the impact of sequence variants on enhancers. Focusing on pancreatic islets, a T2D relevant tissue, we show that our model learns islet-specific transcription factor (TF) regulatory patterns and can be used to prioritize candidate causal variants. At 101 genetic signals associated with T2D and related glycemic traits where multiple variants occur in linkage disequilibrium, our method nominates a single causal variant for each association signal, including three variants previously shown to alter reporter activity in islet-relevant cell types. For another signal associated with blood glucose levels, we biochemically test all candidate causal variants from statistical fine-mapping using a pancreatic islet beta cell line and show biochemical evidence of allelic effects on TF binding for the model-prioritized variant. To aid in future research, we publicly distribute our model and islet enhancer perturbation scores across ~67 million genetic variants. We anticipate that DL methods like the one presented in this study will enhance the prioritization of candidate causal variants for functional studies.
遗传关联研究已经确定了数百个与 2 型糖尿病(T2D)和相关特征相关的独立信号。尽管取得了这些成功,但确定遗传关联信号背后的特定因果变异仍然具有挑战性。在这项研究中,我们描述了一种用于分析序列变异对增强子影响的深度学习(DL)方法。我们专注于与 T2D 相关的组织——胰岛,展示了我们的模型可以学习胰岛特异性转录因子(TF)调控模式,并可用于优先考虑候选因果变异。在与 T2D 和相关血糖特征相关的 101 个遗传信号中,多个变异处于连锁不平衡状态,我们的方法为每个关联信号指定了一个单一的因果变异,其中包括三个先前显示在胰岛相关细胞类型中改变报告基因活性的变异。对于另一个与血糖水平相关的信号,我们使用胰岛β细胞系对统计精细映射中的所有候选因果变异进行了生化测试,并显示模型优先变异的 TF 结合的等位基因效应的生化证据。为了帮助未来的研究,我们在大约 6700 万个遗传变异中公开分发我们的模型和胰岛增强子扰动评分。我们预计,像本研究中提出的那样的 DL 方法将增强候选因果变异的功能研究优先级。