Lin Yi-Sian, Tan Taotao, Wang Ying, Pasaniuc Bogdan, Martin Alicia R, Atkinson Elizabeth G
bioRxiv. 2025 Mar 18:2025.03.18.644029. doi: 10.1101/2025.03.18.644029.
Polygenic scores (PGS) are widely used for estimating genetic predisposition to complex traits by aggregating the effects of common variants into a single measure. They hold promise in identifying individuals at increased risk for diseases, allowing earlier screening and interventions. Genotyping arrays, commonly used for PGS computation, are affordable and computationally efficient, while whole-genome sequencing (WGS) offers a comprehensive view of genetic variation. Using the same set of individuals, we compared PGS derived from arrays and WGS across multiple traits to evaluate differences in predictive performance, portability across populations, and computational efficiency. We computed PGS for 10 traits across the spectrum of heritability and polygenicity in the three largest genetic ancestry groups in (European, African American, Admixed American), trained on the multi-ancestry meta-analyses from the Pan-UK Biobank. Using the clumping and thresholding (C+T) method, we found that WGS-based PGS outperformed array-based PRS for highly polygenic traits but showed differentially reduced accuracy for sparse traits in certain populations. This may be attributable to the lower allele frequency observed in clumped variants from WGS compared to arrays. Using the LD-informed PRS-CS method, we observed overall improved prediction performance compared to C+T, with WGS outperforming arrays across most non-cancer traits. In conclusion, while PGS computed using WGS generally provide superior predictive power with PRS-CS, the advantage over arrays is context-dependent, varying by trait, population, and the PGS method. This study provides insights into the complexities and potential advantages of using different genotype discovery approach for polygenic predictions in diverse populations.
多基因评分(PGS)通过将常见变异的效应汇总为单一指标,被广泛用于估计复杂性状的遗传易感性。它们有望识别出疾病风险增加的个体,从而实现更早的筛查和干预。常用于计算PGS的基因分型阵列价格实惠且计算效率高,而全基因组测序(WGS)则能提供遗传变异的全面视图。我们使用同一组个体,比较了从阵列和WGS得出的跨多个性状的PGS,以评估预测性能、跨人群的可移植性和计算效率方面的差异。我们针对欧洲、非裔美国人、混血美国人这三个最大遗传血统群体中遗传力和多基因性范围内的10个性状计算了PGS,训练数据来自泛英国生物银行的多血统荟萃分析。使用聚类和阈值设定(C+T)方法,我们发现基于WGS的PGS在高度多基因性状上优于基于阵列的PRS,但在某些人群的稀疏性状上准确性有所差异降低。这可能归因于与阵列相比,WGS聚类变异中观察到的等位基因频率较低。使用基于连锁不平衡的PRS-CS方法,我们观察到与C+T相比,预测性能总体有所提高,在大多数非癌症性状上WGS优于阵列。总之,虽然使用WGS计算的PGS通常通过PRS-CS提供更高的预测能力,但相对于阵列的优势取决于具体情况,因性状、人群和PGS方法而异。本研究深入探讨了在不同人群中使用不同基因型发现方法进行多基因预测的复杂性和潜在优势。