Pain Oliver
Maurice Wohl Clinical Neuroscience Institute, Department of Basic and Clinical Neuroscience, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK.
HGG Adv. 2025 Jul 18;6(4):100482. doi: 10.1016/j.xhgg.2025.100482.
Genome-wide association studies (GWASs) from multiple ancestral populations are increasingly available, offering opportunities to improve the accuracy and equity of polygenic scores (PGSs). Several methods now aim to leverage multiple GWAS sources, but predictive performance and computational efficiency remain unclear, particularly when individual-level tuning data are unavailable. This study evaluates a comprehensive set of PGS methods across African (AFR), East Asian (EAS), and European (EUR) ancestries for 10 complex traits, using summary statistics from the Ugandan Genome Resource, Biobank Japan, UK Biobank, and the Million Veteran Program. Single-source PGSs were derived using methods including DBSLMM, lassosum, LDpred2, MegaPRS, pT + clump, PRS-CS, QuickPRS, and SBayesRC. Multi-source approaches included PRS-CSx, TL-PRS, X-Wing, and combinations of independently optimized single-source scores. All methods were restricted to HapMap3 variants and used linkage disequilibrium reference panels matching the GWAS super population. A key contribution is a novel application of the LEOPARD method to estimate optimal linear combinations of population-specific PGSs using only summary statistics. Analyses were implemented using the open-source GenoPred pipeline. In AFR and EAS populations, PGS combining ancestry-aligned and European GWASs outperformed single-source models. Linear combinations of independently optimized scores consistently outperformed current jointly optimized multi-source methods, while being substantially more computationally efficient. The LEOPARD extension offered a practical solution for tuning these combinations when only summary statistics were available, achieving performance comparable to tuning with individual-level data. These findings highlight a flexible and generalizable framework for multi-source PGS construction. The GenoPred pipeline supports more equitable, accurate, and accessible polygenic prediction.
来自多个祖先群体的全基因组关联研究(GWAS)越来越多,为提高多基因评分(PGS)的准确性和公平性提供了机会。现在有几种方法旨在利用多个GWAS来源,但预测性能和计算效率仍不明确,特别是在没有个体水平的调整数据时。本研究使用乌干达基因组资源、日本生物银行、英国生物银行和百万退伍军人计划的汇总统计数据,对非洲(AFR)、东亚(EAS)和欧洲(EUR)祖先的10种复杂性状的一组全面的PGS方法进行了评估。单源PGS使用包括DBSLMM、lassosum、LDpred2、MegaPRS、pT + clump、PRS-CS、QuickPRS和SBayesRC在内的方法得出。多源方法包括PRS-CSx、TL-PRS、X-Wing以及独立优化的单源评分的组合。所有方法都限于HapMap3变体,并使用与GWAS超级群体匹配的连锁不平衡参考面板。一个关键贡献是LEOPARD方法的新应用,即仅使用汇总统计数据来估计特定人群PGS的最佳线性组合。分析使用开源的GenoPred管道进行。在AFR和EAS人群中,结合祖先对齐和欧洲GWAS的PGS优于单源模型。独立优化评分的线性组合始终优于当前联合优化的多源方法,同时计算效率更高。当只有汇总统计数据可用时,LEOPARD扩展为调整这些组合提供了一个实用的解决方案,其性能与使用个体水平数据进行调整相当。这些发现突出了一个用于多源PGS构建的灵活且可推广的框架。GenoPred管道支持更公平、准确和可及的多基因预测。