Das Anjali, Lakhani Chirag, Terwagne Chloé, Lin Jui-Shan T, Naito Tatsuhiko, Raj Towfique, Knowles David A
Computer Science, Columbia University, New York, NY, USA; New York Genome Center, New York, NY, USA.
New York Genome Center, New York, NY, USA.
Am J Hum Genet. 2025 Aug 13. doi: 10.1016/j.ajhg.2025.07.016.
Increased availability of whole-genome sequencing (WGS) has facilitated the study of rare variants (RVs) in complex diseases. Multiple RV association tests are available to study the relationship between genotype and phenotype, but most do not fully leverage the availability of variant-level functional annotations. We propose genome-wide rare variant enrichment evaluation (gruyere), an empirical Bayesian framework that complements existing methods by learning global, trait-specific weights for functional annotations to improve variant prioritization. We apply gruyere to WGS data from the Alzheimer's Disease Sequencing Project to identify Alzheimer disease (AD)-associated genes and annotations. Growing evidence suggests that the disruption of microglial regulation is a key contributor to AD risk, yet existing methods have not examined rare non-coding effects that incorporate such cell-type-specific information. To address this gap, we (1) define per-gene non-coding RV test sets using predicted enhancer and promoter regions in microglia and other brain cell types (oligodendrocytes, astrocytes, and neurons) and (2) include cell-type-specific variant effect predictions (VEPs) as functional annotations. gruyere identifies 13 significant genetic associations not detected by other RV methods, four of which remain significant in omnibus tests. We find that deep-learning-based VEPs for splicing, transcription factor binding, and chromatin state are highly predictive of functional non-coding RVs. Our study establishes a robust framework incorporating functional annotations, coding RVs, and cell-type-associated non-coding RVs to perform genome-wide association tests, uncovering AD-relevant genes and annotations.
全基因组测序(WGS)可用性的提高促进了复杂疾病中罕见变异(RV)的研究。有多种RV关联测试可用于研究基因型与表型之间的关系,但大多数测试并未充分利用变异水平的功能注释。我们提出了全基因组罕见变异富集评估(gruyere),这是一个经验贝叶斯框架,通过学习功能注释的全局、特定性状权重来补充现有方法,以改善变异优先级排序。我们将gruyere应用于阿尔茨海默病测序项目的WGS数据,以识别与阿尔茨海默病(AD)相关的基因和注释。越来越多的证据表明,小胶质细胞调节的破坏是AD风险的关键因素,但现有方法尚未研究纳入此类细胞类型特异性信息的罕见非编码效应。为了弥补这一差距,我们(1)使用小胶质细胞和其他脑细胞类型(少突胶质细胞、星形胶质细胞和神经元)中预测的增强子和启动子区域定义每个基因的非编码RV测试集,(2)将细胞类型特异性变异效应预测(VEP)作为功能注释。gruyere识别出其他RV方法未检测到的13个显著遗传关联,其中4个在综合测试中仍然显著。我们发现,基于深度学习的剪接、转录因子结合和染色质状态的VEP对功能性非编码RV具有高度预测性。我们的研究建立了一个强大的框架,纳入功能注释、编码RV和细胞类型相关的非编码RV,以进行全基因组关联测试,揭示与AD相关的基因和注释。