Kerner Gaspard, Kamitaki Nolan, Strober Benjamin, Price Alkes L
Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA.
Department of Biomedical Informatics, Harvard Medical School, Boston, MA.
medRxiv. 2025 May 6:2025.05.05.25327017. doi: 10.1101/2025.05.05.25327017.
Genome-wide association studies (GWAS) have identified thousands of disease-associated loci, yet their interpretation remains limited by the heterogeneity of underlying biological processes. We propose Joint Pleiotropic and Epigenomic Partitioning (J-PEP), a clustering framework that integrates pleiotropic SNP effects on auxiliary traits and tissue-specific epigenomic data to partition disease-associated loci into biologically distinct clusters. To benchmark J-PEP against existing methods, we introduce a metric-Pleiotropic and Epigenomic Prediction Accuracy (PEPA)-that evaluates how well the clusters predict SNP-to-trait and SNP-to-tissue associations using off-chromosome data, avoiding overfitting. Applying J-PEP to GWAS summary statistics for 165 diseases/traits (average ), we attained 16-30% higher PEPA than pleiotropic or epigenomic partitioning approaches with larger improvements for well-powered traits, consistent with simulations; these gains arise from J-PEP's tendency to upweight correlated structure-signals present in both auxiliary trait and tissue data-thereby emphasizing shared components. For type 2 diabetes (T2D), J-PEP identified clusters refining canonical pathological processes while revealing underexplored immune and developmental signals. For hypertension (HTN), J-PEP identified stromal and adrenal-endocrine processes that were not identified in prior analyses. For neutrophil count, J-PEP identified hematopoietic, hepatic-inflammatory, and neuroimmune processes, expanding biological interpretation beyond classical immune regulation. Notably, integrating single-cell chromatin accessibility data refined bulk-based clusters, enhancing cell-type resolution and specificity. For T2D, single-cell data refined a bulk endocrine cluster to pancreatic islet -cells, consistent with established -cell dysfunction in insulin deficiency; for HTN, single-cell data refined a bulk endocrine cluster to adrenal cortex cells, consistent with a GO enrichment for neutrophil-mediated inflammation that implicates feedback between aldosterone production in the adrenal gland and local immune signaling. In conclusion, J-PEP provides a principled framework for partitioning GWAS loci into interpretable, tissue-informed clusters that provide biological insights on complex disease.
全基因组关联研究(GWAS)已经鉴定出数千个与疾病相关的基因座,但其解读仍受潜在生物学过程异质性的限制。我们提出了联合多效性和表观基因组划分(J-PEP),这是一种聚类框架,它整合了多效性单核苷酸多态性(SNP)对辅助性状的影响以及组织特异性表观基因组数据,以将与疾病相关的基因座划分为生物学上不同的簇。为了将J-PEP与现有方法进行基准比较,我们引入了一种指标——多效性和表观基因组预测准确性(PEPA),该指标使用染色体外数据评估簇对SNP与性状以及SNP与组织关联的预测能力,避免过度拟合。将J-PEP应用于165种疾病/性状的GWAS汇总统计数据(平均 ),我们获得的PEPA比多效性或表观基因组划分方法高16% - 30%,对于功效强大的性状有更大的提升,这与模拟结果一致;这些提升源于J-PEP倾向于加重辅助性状和组织数据中存在的相关结构信号——从而强调共享成分。对于2型糖尿病(T2D),J-PEP识别出的簇细化了典型病理过程,同时揭示了未充分探索的免疫和发育信号。对于高血压(HTN),J-PEP识别出了先前分析中未发现的基质和肾上腺 - 内分泌过程。对于中性粒细胞计数,J-PEP识别出了造血、肝脏炎症和神经免疫过程,将生物学解读扩展到了经典免疫调节之外。值得注意的是,整合单细胞染色质可及性数据细化了基于整体的簇,提高了细胞类型分辨率和特异性。对于T2D,单细胞数据将一个整体内分泌簇细化为胰岛β细胞,这与胰岛素缺乏时既定的β细胞功能障碍一致;对于HTN,单细胞数据将一个整体内分泌簇细化为肾上腺皮质细胞,这与中性粒细胞介导的炎症的基因本体(GO)富集一致,该富集暗示了肾上腺醛固酮产生与局部免疫信号之间的反馈。总之,J-PEP为将GWAS基因座划分为可解释的、基于组织信息的簇提供了一个有原则的框架,这些簇为复杂疾病提供了生物学见解。