Ockerman Franklin, Chen Brian, Sun Quan, Kharitonova Elena V, Chen Walter, Zhou Laura Y, Loos Ruth J F, Kooperberg Charles, Peters Ulrike, Haessler Jeffrey, Reiner Alexander, Jung Su Yon, Manson JoAnn E, Nassir Rami, North Kari E, Buyske Steven, Haiman Christopher A, Conti David V, Wilkens Lynne R, Lange Ethan M, Cox Nancy J, Cao Hongyuan, Raffield Laura M, Li Yun, Tao Ran
Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
Center for Computation and Genomic Medicine, Children's Hospital of Philadelphia, Philadelphia, PA, USA.
bioRxiv. 2025 Aug 27:2025.08.26.671106. doi: 10.1101/2025.08.26.671106.
Polygenic scores (PGS) have promising clinical applications for risk stratification, disease screening, and personalized medicine. However, most PGS are trained on predominantly European ancestry cohorts and have limited portability to external populations. While cross-population PGS methods have demonstrated greater generalizability than single-ancestry PGS, they fail to properly account for individuals with recent admixture between continental ancestry groups. GAUDI is a recently proposed PGS method which overcomes this gap by leveraging local ancestry to estimate ancestry-specific effects, penalizing but allowing ancestry-differential effects. However, the modified fused LASSO approach used by GAUDI is computationally expensive and does not readily accommodate more than two-way admixture. To address these limitations, we introduce HAUDI, an efficient LASSO framework for admixed PGS construction. HAUDI re-parameterizes the GAUDI model as a standard LASSO problem, allowing for extension to multi-way admixture settings and far superior computational speed than GAUDI. In extensive simulations, HAUDI compares favorably to GAUDI while dramatically reducing computation time. In real data applications, HAUDI uniformly out-performs GAUDI across 18 clinical phenotypes, including total triglycerides (TG), C-reactive protein (CRP), and mean corpuscular hemoglobin concentration (MCHC), and shows substantial benefits over an ancestry-agnostic PGS for white blood cell count (WBC) and chronic kidney disease (CKD).
多基因评分(PGS)在风险分层、疾病筛查和个性化医疗方面具有广阔的临床应用前景。然而,大多数PGS是在主要为欧洲血统的队列上进行训练的,对外部人群的可转移性有限。虽然跨人群PGS方法已证明比单血统PGS具有更高的通用性,但它们未能妥善考虑大陆血统群体之间近期存在混合血统的个体。GAUDI是最近提出的一种PGS方法,它通过利用本地血统来估计特定血统的效应,对血统差异效应进行惩罚但允许其存在,从而克服了这一差距。然而,GAUDI使用的改进型融合套索方法计算成本高昂,并且不容易适应超过双向混合血统的情况。为了解决这些局限性,我们引入了HAUDI,这是一种用于构建混合血统PGS的高效套索框架。HAUDI将GAUDI模型重新参数化为一个标准的套索问题,允许扩展到多向混合血统设置,并且计算速度比GAUDI快得多。在广泛的模拟中,HAUDI与GAUDI相比表现良好,同时显著减少了计算时间。在实际数据应用中,HAUDI在18种临床表型上均一致优于GAUDI,包括总甘油三酯(TG)、C反应蛋白(CRP)和平均红细胞血红蛋白浓度(MCHC),并且在白细胞计数(WBC)和慢性肾脏病(CKD)方面,相较于无血统特异性的PGS显示出显著优势。