Mwesigwa Savannah, Dai Yulin, Enduru Nitesh, Zhao Zhongming
Center for Precision Health, McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, United States.
Faillace Department of Psychiatry and Behavioral Sciences, McGovern Medical School, The University of Texas Health Science Center at Houston, Houston, TX, United States.
Front Genet. 2025 Feb 4;16:1507395. doi: 10.3389/fgene.2025.1507395. eCollection 2025.
Polygenic Scores (PGSs) assess cumulative genetic risk variants that contribute to the association with complex diseases like Alzheimer's Disease (AD). The PGS Catalog is a valuable repository of PGSs of various complex diseases, but it lacks standardized annotations and harmonization, making the information difficult to integrate for a specific disease.
In this study, we curated 44 PGS datasets for AD from the PGS Catalog, categorized them into five methodological groups, and annotated 813,257 variants to nearby genes. We aligned the scores based on the "GWAS significant variants" (GWAS-SV) method with the GWAS Catalog and flagged redundant files and those with a "limited scope" due to insufficient external GWAS support. Using rank aggregation (RA), we prioritized consistently important variants and provided an R package, "PgsRankRnnotatR," to automate this process.
Of the six RA methods evaluated, "Dowdall" method was the most robust. Our refined dataset, enhanced by multiple RA options, is a valuable resource for AD researchers selecting PGSs or exploring AD-related genetic variants.
Our approach offers a framework for curating, harmonizing, and prioritizing PGS datasets, improving their usability for AD research. By integrating multiple RA methods and automating the process, we provide a flexible tool that enhances PGS selection and genetic variant exploration. This framework can be extended to other complex diseases or traits, facilitating broader applications in genetic risk assessment.
多基因评分(PGS)用于评估与阿尔茨海默病(AD)等复杂疾病相关的累积遗传风险变异。PGS目录是各种复杂疾病PGS的宝贵储存库,但它缺乏标准化注释和协调,使得特定疾病的信息难以整合。
在本研究中,我们从PGS目录中整理了44个AD的PGS数据集,将它们分为五个方法学组,并将813,257个变异注释到附近基因。我们基于“全基因组关联研究显著变异”(GWAS-SV)方法将评分与GWAS目录对齐,并标记冗余文件以及因外部GWAS支持不足而“范围有限”的文件。使用秩聚合(RA),我们对始终重要的变异进行了优先级排序,并提供了一个R包“PgsRankRnnotatR”来自动化此过程。
在评估的六种RA方法中,“Dowdall”方法最为稳健。我们通过多种RA选项增强的精炼数据集是AD研究人员选择PGS或探索AD相关遗传变异的宝贵资源。
我们的方法为整理、协调和优先排序PGS数据集提供了一个框架,提高了它们在AD研究中的可用性。通过整合多种RA方法并自动化该过程,我们提供了一个灵活的工具,增强了PGS选择和遗传变异探索。这个框架可以扩展到其他复杂疾病或性状,促进在遗传风险评估中的更广泛应用。