Suppr超能文献

用于高效整合预训练多基因风险评分的无监督集成学习

Unsupervised Ensemble Learning for Efficient Integration of Pre-trained Polygenic Risk Scores.

作者信息

Duan Rui, Gao Chenyin, Tubbs Justin, Han Yi, Guo Min, Li Sijia, Ma Erica, Luo Dailin, Smoller Jordan, Lee Phil

出版信息

Res Sq. 2025 Apr 1:rs.3.rs-5976048. doi: 10.21203/rs.3.rs-5976048/v1.

Abstract

The growing availability of pre-trained polygenic risk score (PRS) models has enabled their integration into real-world applications, reducing the need for extensive data labeling, training, and calibration. However, selecting the most suitable PRS model for a specific target population remains challenging, due to issues such as limited transferability, data heterogeneity, and the scarcity of observed phenotype in real-world settings. Ensemble learning offers a promising avenue to enhance the predictive accuracy of genetic risk assessments, but most existing methods often rely on observed phenotype data or additional genome-wide association studies (GWAS) from the target population to optimize ensemble weights, limiting their utility in real-time implementation. Here, we present the UNSupervised enSemble PRS (UNSemblePRS), an unsupervised ensemble learning framework, that combines pre-trained PRS models without requiring phenotype data or summaries from the target population. Unlike traditional supervised approaches, UNSemblePRS aggregates models based on prediction concordance across a curated subset of candidate PRS models. We evaluated UNSemblePRS using both continuous and binary traits in the All of Us database, demonstrating its scalability and robust performance across diverse populations. These results underscore UNSemblePRS as an accessible tool for integrating PRS models into real-world contexts, offering broad applicability as the availability of PRS models continues to expand.

摘要

预训练多基因风险评分(PRS)模型的可用性不断提高,使其能够集成到实际应用中,减少了对大量数据标记、训练和校准的需求。然而,由于可转移性有限、数据异质性以及现实环境中观察到的表型稀缺等问题,为特定目标人群选择最合适的PRS模型仍然具有挑战性。集成学习为提高遗传风险评估的预测准确性提供了一条有前景的途径,但大多数现有方法通常依赖于目标人群的观察到的表型数据或额外的全基因组关联研究(GWAS)来优化集成权重,限制了它们在实时实施中的效用。在这里,我们提出了无监督集成PRS(UNSemblePRS),这是一个无监督集成学习框架,它结合了预训练的PRS模型,而无需目标人群的表型数据或汇总数据。与传统的监督方法不同,UNSemblePRS基于精心挑选的候选PRS模型子集中的预测一致性来聚合模型。我们在“我们所有人”数据库中使用连续和二元性状对UNSemblePRS进行了评估,证明了它在不同人群中的可扩展性和稳健性能。这些结果强调了UNSemblePRS作为将PRS模型集成到实际环境中的一种可访问工具,随着PRS模型可用性的不断扩大,具有广泛的适用性。

相似文献

1
Unsupervised Ensemble Learning for Efficient Integration of Pre-trained Polygenic Risk Scores.
Res Sq. 2025 Apr 1:rs.3.rs-5976048. doi: 10.21203/rs.3.rs-5976048/v1.
2
Unsupervised Ensemble Learning for Efficient Integration of Pre-trained Polygenic Risk Scores.
medRxiv. 2025 Mar 20:2025.01.06.25320058. doi: 10.1101/2025.01.06.25320058.
3
One score to rule them all: regularized ensemble polygenic risk prediction with GWAS summary statistics.
bioRxiv. 2024 Dec 4:2024.11.27.625748. doi: 10.1101/2024.11.27.625748.
4
An Ensemble Penalized Regression Method for Multi-ancestry Polygenic Risk Prediction.
bioRxiv. 2024 Apr 10:2023.03.15.532652. doi: 10.1101/2023.03.15.532652.
5
Optimizing and benchmarking polygenic risk scores with GWAS summary statistics.
Genome Biol. 2024 Oct 8;25(1):260. doi: 10.1186/s13059-024-03400-w.
6
An ensemble penalized regression method for multi-ancestry polygenic risk prediction.
Nat Commun. 2024 Apr 15;15(1):3238. doi: 10.1038/s41467-024-47357-7.
8
9
A Stacking Framework for Polygenic Risk Prediction in Admixed Individuals.
medRxiv. 2024 Feb 3:2024.01.31.24302103. doi: 10.1101/2024.01.31.24302103.
10
Fast and scalable ensemble learning method for versatile polygenic risk prediction.
Proc Natl Acad Sci U S A. 2024 Aug 13;121(33):e2403210121. doi: 10.1073/pnas.2403210121. Epub 2024 Aug 7.

本文引用的文献

1
Enhancing the Polygenic Score Catalog with tools for score calculation and ancestry normalization.
Nat Genet. 2024 Oct;56(10):1989-1994. doi: 10.1038/s41588-024-01937-x.
2
Calibrated prediction intervals for polygenic scores across diverse contexts.
Nat Genet. 2024 Jul;56(7):1386-1396. doi: 10.1038/s41588-024-01792-w. Epub 2024 Jun 17.
3
An ensemble penalized regression method for multi-ancestry polygenic risk prediction.
Nat Commun. 2024 Apr 15;15(1):3238. doi: 10.1038/s41467-024-47357-7.
4
Integrative polygenic risk score improves the prediction accuracy of complex traits and diseases.
Cell Genom. 2024 Apr 10;4(4):100523. doi: 10.1016/j.xgen.2024.100523. Epub 2024 Mar 19.
5
Genomic data in the All of Us Research Program.
Nature. 2024 Mar;627(8003):340-346. doi: 10.1038/s41586-023-06957-x. Epub 2024 Feb 19.
6
A new method for multiancestry polygenic prediction improves performance across diverse populations.
Nat Genet. 2023 Oct;55(10):1757-1768. doi: 10.1038/s41588-023-01501-z. Epub 2023 Sep 25.
7
Multi-PGS enhances polygenic prediction by combining 937 polygenic scores.
Nat Commun. 2023 Aug 5;14(1):4702. doi: 10.1038/s41467-023-40330-w.
8
Optimal strategies for learning multi-ancestry polygenic scores vary across traits.
Nat Commun. 2023 Jul 7;14(1):4023. doi: 10.1038/s41467-023-38930-7.
9
A spectral method for assessing and combining multiple data visualizations.
Nat Commun. 2023 Feb 11;14(1):780. doi: 10.1038/s41467-023-36492-2.
10
Importance of Including Non-European Populations in Large Human Genetic Studies to Enhance Precision Medicine.
Annu Rev Biomed Data Sci. 2022 Aug 10;5:321-339. doi: 10.1146/annurev-biodatasci-122220-112550. Epub 2022 May 16.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验