一种用于多血统多基因风险预测的集成惩罚回归方法。

An Ensemble Penalized Regression Method for Multi-ancestry Polygenic Risk Prediction.

作者信息

Zhang Jingning, Zhan Jianan, Jin Jin, Ma Cheng, Zhao Ruzhang, O'Connell Jared, Jiang Yunxuan, Koelsch Bertram L, Zhang Haoyu, Chatterjee Nilanjan

机构信息

Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA.

23andMe Inc., Sunnyvale, CA, USA.

出版信息

bioRxiv. 2024 Apr 10:2023.03.15.532652. doi: 10.1101/2023.03.15.532652.

DOI:10.1101/2023.03.15.532652

PMID:36993331

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10055041/

Abstract

Great efforts are being made to develop advanced polygenic risk scores (PRS) to improve the prediction of complex traits and diseases. However, most existing PRS are primarily trained on European ancestry populations, limiting their transferability to non-European populations. In this article, we propose a novel method for generating multi-ancestry Polygenic Risk scOres based on enSemble of PEnalized Regression models (PROSPER). PROSPER integrates genome-wide association studies (GWAS) summary statistics from diverse populations to develop ancestry-specific PRS with improved predictive power for minority populations. The method uses a combination of (lasso) and (ridge) penalty functions, a parsimonious specification of the penalty parameters across populations, and an ensemble step to combine PRS generated across different penalty parameters. We evaluate the performance of PROSPER and other existing methods on large-scale simulated and real datasets, including those from 23andMe Inc., the Global Lipids Genetics Consortium, and All of Us. Results show that PROSPER can substantially improve multi-ancestry polygenic prediction compared to alternative methods across a wide variety of genetic architectures. In real data analyses, for example, PROSPER increased out-of-sample prediction R for continuous traits by an average of 70% compared to a state-of-the-art Bayesian method (PRS-CSx) in the African ancestry population. Further, PROSPER is computationally highly scalable for the analysis of large SNP contents and many diverse populations.

摘要

人们正在付出巨大努力来开发先进的多基因风险评分（PRS），以改善对复杂性状和疾病的预测。然而，大多数现有的PRS主要是在欧洲血统人群上进行训练的，这限制了它们向非欧洲人群的可转移性。在本文中，我们提出了一种基于惩罚回归模型集成（PROSPER）生成多血统多基因风险评分的新方法。PROSPER整合了来自不同人群的全基因组关联研究（GWAS）汇总统计数据，以开发对少数族裔人群具有更高预测能力的特定血统PRS。该方法使用了（套索）和（岭）惩罚函数的组合、跨人群惩罚参数的简约设定以及一个集成步骤来组合在不同惩罚参数下生成的PRS。我们在大规模模拟和真实数据集上评估了PROSPER和其他现有方法的性能，包括来自23andMe公司、全球脂质遗传学联盟和“我们所有人”项目的数据。结果表明，与各种遗传结构下的替代方法相比，PROSPER可以显著提高多血统多基因预测能力。例如，在实际数据分析中，与非洲血统人群中一种先进的贝叶斯方法（PRS-CSx）相比，PROSPER将连续性状的样本外预测R平均提高了70%。此外，PROSPER在计算上对于分析大量单核苷酸多态性（SNP）内容和许多不同人群具有高度可扩展性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/516e/11005619/ada1714309f1/nihpp-2023.03.15.532652v3-f0001.jpg

相似文献

An Ensemble Penalized Regression Method for Multi-ancestry Polygenic Risk Prediction.

bioRxiv. 2024 Apr 10:2023.03.15.532652. doi: 10.1101/2023.03.15.532652.

An ensemble penalized regression method for multi-ancestry polygenic risk prediction.

Nat Commun. 2024 Apr 15;15(1):3238. doi: 10.1038/s41467-024-47357-7.

MUSSEL: Enhanced Bayesian Polygenic Risk Prediction Leveraging Information across Multiple Ancestry Groups.

bioRxiv. 2023 Sep 21:2023.04.12.536510. doi: 10.1101/2023.04.12.536510.

Fast and scalable ensemble learning method for versatile polygenic risk prediction.

Proc Natl Acad Sci U S A. 2024 Aug 13;121(33):e2403210121. doi: 10.1073/pnas.2403210121. Epub 2024 Aug 7.

Improving polygenic prediction in ancestrally diverse populations.

Nat Genet. 2022 May;54(5):573-580. doi: 10.1038/s41588-022-01054-7. Epub 2022 May 5.

MUSSEL: Enhanced Bayesian polygenic risk prediction leveraging information across multiple ancestry groups.

Cell Genom. 2024 Apr 10;4(4):100539. doi: 10.1016/j.xgen.2024.100539.

: A powerful trans-ancestry Polygenic Risk Score method.

bioRxiv. 2023 Feb 21:2023.02.17.528938. doi: 10.1101/2023.02.17.528938.

All of Us diversity and scale improve polygenic prediction contextually with greatest improvements for under-represented populations.

bioRxiv. 2024 Aug 6:2024.08.06.606846. doi: 10.1101/2024.08.06.606846.

Multi-ancestry polygenic risk scores for venous thromboembolism.

Hum Mol Genet. 2024 Sep 3;33(18):1584-1591. doi: 10.1093/hmg/ddae097.

Efficient Implementation of Penalized Regression for Genetic Risk Prediction.

Genetics. 2019 May;212(1):65-74. doi: 10.1534/genetics.119.302019. Epub 2019 Feb 26.

本文引用的文献

An ensemble penalized regression method for multi-ancestry polygenic risk prediction.

Nat Commun. 2024 Apr 15;15(1):3238. doi: 10.1038/s41467-024-47357-7.

Improving polygenic risk prediction in admixed populations by explicitly modeling ancestral-differential effects via GAUDI.

Nat Commun. 2024 Feb 3;15(1):1016. doi: 10.1038/s41467-024-45135-z.

A new method for multiancestry polygenic prediction improves performance across diverse populations.

Nat Genet. 2023 Oct;55(10):1757-1768. doi: 10.1038/s41588-023-01501-z. Epub 2023 Sep 25.

Principles and methods for transferring polygenic risk scores across global populations.

Nat Rev Genet. 2024 Jan;25(1):8-25. doi: 10.1038/s41576-023-00637-2. Epub 2023 Aug 24.

Polygenic scoring accuracy varies across the genetic ancestry continuum.

Nature. 2023 Jun;618(7966):774-781. doi: 10.1038/s41586-023-06079-4. Epub 2023 May 17.

Leveraging global multi-ancestry meta-analysis in the study of idiopathic pulmonary fibrosis genetics.

Cell Genom. 2022 Oct 12;2(10):100181. doi: 10.1016/j.xgen.2022.100181.

Stroke genetics informs drug discovery and risk prediction across ancestries.

Nature. 2022 Nov;611(7934):115-123. doi: 10.1038/s41586-022-05165-3. Epub 2022 Sep 30.

Identifying and correcting for misspecifications in GWAS summary statistics and polygenic scores.

HGG Adv. 2022 Aug 18;3(4):100136. doi: 10.1016/j.xhgg.2022.100136. eCollection 2022 Oct 13.

Multi-ancestry genetic study of type 2 diabetes highlights the power of diverse populations for discovery and translation.

Nat Genet. 2022 May;54(5):560-572. doi: 10.1038/s41588-022-01058-3. Epub 2022 May 12.

Improving polygenic prediction in ancestrally diverse populations.

Nat Genet. 2022 May;54(5):573-580. doi: 10.1038/s41588-022-01054-7. Epub 2022 May 5.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

一种用于多血统多基因风险预测的集成惩罚回归方法。

An Ensemble Penalized Regression Method for Multi-ancestry Polygenic Risk Prediction.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

本文引用的文献