Das Anjali, Lakhani Chirag, Terwagne Chloé, Lin Jui-Shan T, Naito Tatsuhiko, Raj Towfique, Knowles David A
Computer Science, Columbia University, New York, NY, USA.
New York Genome Center, New York,NY, USA.
medRxiv. 2025 Mar 4:2024.12.06.24318577. doi: 10.1101/2024.12.06.24318577.
The increasing availability of whole-genome sequencing (WGS) has begun to elucidate the contribution of rare variants (RVs), both coding and non-coding, to complex disease. Multiple RV association tests are available to study the relationship between genotype and phenotype, but most are restricted to per-gene models and do not fully leverage the availability of variant-level functional annotations. We propose Genome-wide Rare Variant EnRichment Evaluation (gruyere), a Bayesian probabilistic model that complements existing methods by learning global, trait-specific weights for functional annotations to improve variant prioritization. We apply gruyere to WGS data from the Alzheimer's Disease (AD) Sequencing Project, consisting of 7,966 cases and 13,412 controls, to identify AD-associated genes and annotations. Growing evidence suggests that disruption of microglial regulation is a key contributor to AD risk, yet existing methods have not had sufficient power to examine rare non-coding effects that incorporate such cell-type specific information. To address this gap, we 1) use predicted enhancer and promoter regions in microglia and other potentially relevant cell types (oligodendrocytes, astrocytes, and neurons) to define per-gene non-coding RV test sets and 2) include cell-type specific variant effect predictions (VEPs) as functional annotations. gruyere identifies 15 significant genetic associations not detected by other RV methods and finds deep learning-based VEPs for splicing, transcription factor binding, and chromatin state are highly predictive of functional non-coding RVs. Our study establishes a novel and robust framework incorporating functional annotations, coding RVs, and cell-type associated non-coding RVs, to perform genome-wide association tests, uncovering AD-relevant genes and annotations.
全基因组测序(WGS)的可及性不断提高,已开始阐明罕见变异(RVs)(包括编码和非编码变异)对复杂疾病的影响。有多种RV关联测试可用于研究基因型与表型之间的关系,但大多数仅限于单基因模型,并未充分利用变异水平功能注释的可及性。我们提出了全基因组罕见变异富集评估(gruyere)方法,这是一种贝叶斯概率模型,通过学习功能注释的全局、性状特异性权重来补充现有方法,以改进变异优先级排序。我们将gruyere应用于阿尔茨海默病(AD)测序项目的WGS数据,该数据包括7966例病例和13412例对照,以识别与AD相关的基因和注释。越来越多的证据表明,小胶质细胞调节的破坏是AD风险的关键因素,但现有方法尚无足够的能力来检测纳入此类细胞类型特异性信息的罕见非编码效应。为了填补这一空白,我们1)使用小胶质细胞和其他潜在相关细胞类型(少突胶质细胞、星形胶质细胞和神经元)中的预测增强子和启动子区域来定义每个基因的非编码RV测试集,2)将细胞类型特异性变异效应预测(VEPs)作为功能注释。gruyere识别出其他RV方法未检测到的15个显著遗传关联,并发现基于深度学习的剪接、转录因子结合和染色质状态的VEPs对功能性非编码RVs具有高度预测性。我们的研究建立了一个新颖且强大的框架,纳入功能注释、编码RVs和细胞类型相关的非编码RVs,以进行全基因组关联测试,揭示与AD相关的基因和注释。