Duan Rui, Gao Chenyin, Tubbs Justin, Han Yi, Guo Min, Li Sijia, Ma Erica, Luo Dailin, Smoller Jordan, Lee Phil
Res Sq. 2025 Apr 1:rs.3.rs-5976048. doi: 10.21203/rs.3.rs-5976048/v1.
The growing availability of pre-trained polygenic risk score (PRS) models has enabled their integration into real-world applications, reducing the need for extensive data labeling, training, and calibration. However, selecting the most suitable PRS model for a specific target population remains challenging, due to issues such as limited transferability, data heterogeneity, and the scarcity of observed phenotype in real-world settings. Ensemble learning offers a promising avenue to enhance the predictive accuracy of genetic risk assessments, but most existing methods often rely on observed phenotype data or additional genome-wide association studies (GWAS) from the target population to optimize ensemble weights, limiting their utility in real-time implementation. Here, we present the UNSupervised enSemble PRS (UNSemblePRS), an unsupervised ensemble learning framework, that combines pre-trained PRS models without requiring phenotype data or summaries from the target population. Unlike traditional supervised approaches, UNSemblePRS aggregates models based on prediction concordance across a curated subset of candidate PRS models. We evaluated UNSemblePRS using both continuous and binary traits in the All of Us database, demonstrating its scalability and robust performance across diverse populations. These results underscore UNSemblePRS as an accessible tool for integrating PRS models into real-world contexts, offering broad applicability as the availability of PRS models continues to expand.
预训练多基因风险评分(PRS)模型的可用性不断提高,使其能够集成到实际应用中,减少了对大量数据标记、训练和校准的需求。然而,由于可转移性有限、数据异质性以及现实环境中观察到的表型稀缺等问题,为特定目标人群选择最合适的PRS模型仍然具有挑战性。集成学习为提高遗传风险评估的预测准确性提供了一条有前景的途径,但大多数现有方法通常依赖于目标人群的观察到的表型数据或额外的全基因组关联研究(GWAS)来优化集成权重,限制了它们在实时实施中的效用。在这里,我们提出了无监督集成PRS(UNSemblePRS),这是一个无监督集成学习框架,它结合了预训练的PRS模型,而无需目标人群的表型数据或汇总数据。与传统的监督方法不同,UNSemblePRS基于精心挑选的候选PRS模型子集中的预测一致性来聚合模型。我们在“我们所有人”数据库中使用连续和二元性状对UNSemblePRS进行了评估,证明了它在不同人群中的可扩展性和稳健性能。这些结果强调了UNSemblePRS作为将PRS模型集成到实际环境中的一种可访问工具,随着PRS模型可用性的不断扩大,具有广泛的适用性。