Venkatesh Rasika, Cardone Katie M, Bradford Yuki, Moore Anni K, Kumar Rachit, Moore Jason H, Shen Li, Kim Dokyoon, Ritchie Marylyn D
Genomics and Computational Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA.
Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA.
medRxiv. 2025 Jun 2:2025.05.31.25328688. doi: 10.1101/2025.05.31.25328688.
Alzheimer's Disease (AD) is the most prevalent condition that impacts the aging population, with no effective treatment or singular underlying causal factor identified. As a complex disease, characterizing the genetic risk of developing AD has proven to be difficult; polygenic scores (PGS) exclusively use common variants which fail to fully capture disease heterogeneity. This study used univariate and multivariate approaches to characterize AD risk. Genome-, transcriptome-, and proteome-wide association studies (GWAS, TWAS, and PWAS) were conducted on 15,480 individuals from the Alzheimer's Disease Sequencing Project (ADSP) R4 release to identify AD-associated signals, followed by pathway enrichment analysis. Integrative risk models (IRMs) were developed using genetically-regulated components of gene and protein expression and clinical covariates. Elastic-net logistic regression and random forest classifiers were evaluated using area under the receiver operating characteristic (AUROC), area under the precision-recall curve (AUPRC), F1-score, and balanced accuracy. These IRMs were compared against baseline PGS and covariate models. We identified 104 genomic, 319 transcriptomic, and 17 proteomic associations with AD under significant thresholds. Putatively novel associations were enriched in signaling, myeloid differentiation, and immune pathways. The best-performing IRM, random forest with transcriptomic and covariate features, achieved an AUROC of 0.703 and AUPRC of 0.622, significantly outperforming PGS and baseline models. Integrating univariate discovery approaches with multivariate modeling enhances AD risk prediction and offers insights into underlying biological processes.
阿尔茨海默病(AD)是影响老年人群的最常见病症,目前尚未确定有效的治疗方法或单一的潜在致病因素。作为一种复杂疾病,确定患AD的遗传风险已被证明很困难;多基因评分(PGS)仅使用常见变异,无法完全捕捉疾病的异质性。本研究采用单变量和多变量方法来描述AD风险。对来自阿尔茨海默病测序项目(ADSP)R4版本的15480名个体进行了全基因组、转录组和蛋白质组关联研究(GWAS、TWAS和PWAS),以识别与AD相关的信号,随后进行通路富集分析。利用基因和蛋白质表达的基因调控成分以及临床协变量开发了综合风险模型(IRM)。使用受试者操作特征曲线下面积(AUROC)、精确召回率曲线下面积(AUPRC)、F1分数和平衡准确率对弹性网逻辑回归和随机森林分类器进行评估。将这些IRM与基线PGS和协变量模型进行比较。我们在显著阈值下确定了104个基因组、319个转录组和17个蛋白质组与AD的关联。推测的新关联在信号传导、髓系分化和免疫通路中富集。表现最佳的IRM,即具有转录组和协变量特征的随机森林,AUROC为0.703,AUPRC为0.622,显著优于PGS和基线模型。将单变量发现方法与多变量建模相结合可增强AD风险预测,并深入了解潜在的生物学过程。