一种基于全基因组关联研究的汇总统计数据构建多基因风险模型并纳入外部信息的惩罚回归框架。

A penalized regression framework for building polygenic risk models based on summary statistics from genome-wide association studies and incorporating external information.

作者信息

Chen Ting-Huei, Chatterjee Nilanjan, Landi Maria Teresa, Shi Jianxin

机构信息

Department of Mathematics and Statistics, Regular member, Cervo Brain Research Centre, University of Laval, 1045, av. of Medicine, Suite 1056, Quebec G1V 0A6, Canada.

Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University Baltimore, Maryland, United States of America, 615 N Wolfe Street Baltimore, MD 21205.

出版信息

J Am Stat Assoc. 2021;116(533):133-143. doi: 10.1080/01621459.2020.1764849. Epub 2020 Oct 12.

DOI:10.1080/01621459.2020.1764849

PMID:34483403

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8414872/

Abstract

Large-scale genome-wide association (GWAS) studies provide opportunities for developing genetic risk prediction models that have the potential to improve disease prevention, intervention or treatment. The key step is to develop polygenic risk score (PRS) models with high predictive performance for a given disease, which typically requires a large training data set for selecting truly associated single nucleotide polymorphisms (SNPs) and estimating effect sizes accurately. Here, we develop a comprehensive penalized regression for fitting regularized regression models to GWAS summary statistics. We propose incorporating Pleiotropy and ANnotation information into PRS (PANPRS) development through suitable formulation of penalty functions and associated tuning parameters. Extensive simulations show that PANPRS performs equally well or better than existing PRS methods when no functional annotation or pleiotropy is incorporated. When functional annotation data and pleiotropy are informative, PANPRS substantially outperforms existing PRS methods in simulations. Finally, we applied our methods to build PRS for type 2 diabetes and melanoma and found that incorporating relevant functional annotations and GWAS of genetically related traits improved prediction of these two complex diseases.

摘要

大规模全基因组关联（GWAS）研究为开发遗传风险预测模型提供了机会，这些模型有可能改善疾病预防、干预或治疗。关键步骤是为特定疾病开发具有高预测性能的多基因风险评分（PRS）模型，这通常需要大量训练数据集来选择真正相关的单核苷酸多态性（SNP）并准确估计效应大小。在此，我们开发了一种综合惩罚回归方法，用于将正则化回归模型拟合到GWAS汇总统计数据。我们建议通过适当设定惩罚函数和相关调整参数，将多效性和注释信息纳入PRS（PANPRS）的开发过程。大量模拟表明，在不纳入功能注释或多效性时，PANPRS的表现与现有PRS方法相当或更优。当功能注释数据和多效性信息丰富时，PANPRS在模拟中显著优于现有PRS方法。最后，我们应用我们的方法构建2型糖尿病和黑色素瘤的PRS，发现纳入相关功能注释和遗传相关性状的GWAS可改善这两种复杂疾病的预测。

相似文献

A penalized regression framework for building polygenic risk models based on summary statistics from genome-wide association studies and incorporating external information.一种基于全基因组关联研究的汇总统计数据构建多基因风险模型并纳入外部信息的惩罚回归框架。

J Am Stat Assoc. 2021;116(533):133-143. doi: 10.1080/01621459.2020.1764849. Epub 2020 Oct 12.

Efficient Implementation of Penalized Regression for Genetic Risk Prediction.高效实现基于惩罚回归的遗传风险预测。

Genetics. 2019 May;212(1):65-74. doi: 10.1534/genetics.119.302019. Epub 2019 Feb 26.

An Ensemble Penalized Regression Method for Multi-ancestry Polygenic Risk Prediction.一种用于多血统多基因风险预测的集成惩罚回归方法。

bioRxiv. 2024 Apr 10:2023.03.15.532652. doi: 10.1101/2023.03.15.532652.

An ensemble penalized regression method for multi-ancestry polygenic risk prediction.一种用于多祖裔多基因风险预测的集成惩罚回归方法。

Nat Commun. 2024 Apr 15;15(1):3238. doi: 10.1038/s41467-024-47357-7.

Winner's Curse Correction and Variable Thresholding Improve Performance of Polygenic Risk Modeling Based on Genome-Wide Association Study Summary-Level Data.胜者之咒校正和可变阈值法可提高基于全基因组关联研究汇总水平数据的多基因风险建模性能。

PLoS Genet. 2016 Dec 30;12(12):e1006493. doi: 10.1371/journal.pgen.1006493. eCollection 2016 Dec.

Tuning Parameters for Polygenic Risk Score Methods Using GWAS Summary Statistics from Training Data.使用来自训练数据的全基因组关联研究（GWAS）汇总统计量的多基因风险评分方法的调整参数。

Res Sq. 2023 May 31:rs.3.rs-2939390. doi: 10.21203/rs.3.rs-2939390/v1.

Pleiotropic mapping and annotation selection in genome-wide association studies with penalized Gaussian mixture models.基于惩罚高斯混合模型的全基因组关联研究中的多效性映射和注释选择。

Bioinformatics. 2018 Aug 15;34(16):2797-2807. doi: 10.1093/bioinformatics/bty204.

Ensembled best subset selection using summary statistics for polygenic risk prediction.使用汇总统计量进行多基因风险预测的集成最佳子集选择。

bioRxiv. 2023 Sep 27:2023.09.25.559307. doi: 10.1101/2023.09.25.559307.

Optimizing and benchmarking polygenic risk scores with GWAS summary statistics.利用 GWAS 汇总统计数据优化和基准化多基因风险评分。

Genome Biol. 2024 Oct 8;25(1):260. doi: 10.1186/s13059-024-03400-w.

Applying polygenic risk score methods to pharmacogenomics GWAS: challenges and opportunities.将多基因风险评分方法应用于药物基因组学全基因组关联研究：挑战与机遇

Brief Bioinform. 2023 Nov 22;25(1). doi: 10.1093/bib/bbad470.

引用本文的文献

Polygenic Hazard Score for Predicting Age-associated Risk of Alzheimer's Disease in European Populations: Development and Validation.用于预测欧洲人群中阿尔茨海默病年龄相关风险的多基因风险评分：开发与验证

medRxiv. 2025 Jul 28:2025.07.28.25332293. doi: 10.1101/2025.07.28.25332293.

Incorporating multiple functional annotations to improve polygenic risk prediction accuracy.整合多种功能注释以提高多基因风险预测准确性。

Cell Genom. 2025 Jun 11;5(6):100850. doi: 10.1016/j.xgen.2025.100850. Epub 2025 Apr 15.

One score to rule them all: regularized ensemble polygenic risk prediction with GWAS summary statistics.一分数统御一切：利用全基因组关联研究汇总统计数据进行正则化集成多基因风险预测

bioRxiv. 2024 Dec 4:2024.11.27.625748. doi: 10.1101/2024.11.27.625748.

Optimizing and benchmarking polygenic risk scores with GWAS summary statistics.利用 GWAS 汇总统计数据优化和基准化多基因风险评分。

Genome Biol. 2024 Oct 8;25(1):260. doi: 10.1186/s13059-024-03400-w.

Multi-Task Learning with Summary Statistics.基于汇总统计量的多任务学习

Adv Neural Inf Process Syst. 2023;36:54020-54031. Epub 2024 May 30.

Enhancing Gene Expression Predictions Using Deep Learning and Functional Annotations.利用深度学习和功能注释增强基因表达预测

Genet Epidemiol. 2025 Jan;49(1):e22595. doi: 10.1002/gepi.22595. Epub 2024 Sep 30.

Incorporating functional annotation with bilevel continuous shrinkage for polygenic risk prediction.将功能注释与双层连续收缩相结合进行多基因风险预测。

BMC Bioinformatics. 2024 Feb 9;25(1):65. doi: 10.1186/s12859-024-05664-2.

mtPGS: Leverage multiple correlated traits for accurate polygenic score construction.mtPGS：利用多个相关性状进行准确的多基因评分构建。

Am J Hum Genet. 2023 Oct 5;110(10):1673-1689. doi: 10.1016/j.ajhg.2023.08.016. Epub 2023 Sep 15.

Multivariate extension of penalized regression on summary statistics to construct polygenic risk scores for correlated traits.基于汇总统计量的惩罚回归的多元扩展，以构建相关性状的多基因风险评分。

HGG Adv. 2023 May 20;4(3):100209. doi: 10.1016/j.xhgg.2023.100209. eCollection 2023 Jul 13.

Estimating the overall fraction of phenotypic variance attributed to high-dimensional predictors measured with error.估计由测量存在误差的高维预测变量所导致的表型方差的总体比例。

Biostatistics. 2024 Apr 15;25(2):486-503. doi: 10.1093/biostatistics/kxad001.

本文引用的文献

SummaryAUC: a tool for evaluating the performance of polygenic risk prediction models in validation datasets with only summary level statistics.摘要：AUC：一种用于评估仅使用汇总统计数据的验证数据集的多基因风险预测模型性能的工具。

Bioinformatics. 2019 Oct 15;35(20):4038-4044. doi: 10.1093/bioinformatics/btz176.

Heritability informed power optimization (HIPO) leads to enhanced detection of genetic associations across multiple traits.遗传力信息启发的功效优化（HIPO）可提高对多个性状的遗传关联的检测能力。

PLoS Genet. 2018 Oct 5;14(10):e1007549. doi: 10.1371/journal.pgen.1007549. eCollection 2018 Oct.

Combining common genetic variants and non-genetic risk factors to predict risk of cutaneous melanoma.结合常见遗传变异和非遗传风险因素预测皮肤黑色素瘤风险。

Hum Mol Genet. 2018 Dec 1;27(23):4145-4156. doi: 10.1093/hmg/ddy282.

Multi-trait analysis of genome-wide association summary statistics using MTAG.使用 MTAG 进行全基因组关联汇总统计的多性状分析。

Nat Genet. 2018 Feb;50(2):229-237. doi: 10.1038/s41588-017-0009-4. Epub 2018 Jan 1.

A Selection Operator for Summary Association Statistics Reveals Allelic Heterogeneity of Complex Traits.一种用于汇总关联统计的选择算子揭示了复杂性状的等位基因异质性。

Am J Hum Genet. 2017 Dec 7;101(6):903-912. doi: 10.1016/j.ajhg.2017.09.027.

Designing penalty functions in high dimensional problems: The role of tuning parameters.高维问题中惩罚函数的设计：调整参数的作用。

Electron J Stat. 2016;10(2):2312-2328. doi: 10.1214/16-EJS1169. Epub 2016 Aug 29.

Joint modeling of genetically correlated diseases and functional annotations increases accuracy of polygenic risk prediction.对基因相关疾病和功能注释进行联合建模可提高多基因风险预测的准确性。

PLoS Genet. 2017 Jun 9;13(6):e1006836. doi: 10.1371/journal.pgen.1006836. eCollection 2017 Jun.

Leveraging functional annotations in genetic risk prediction for human complex diseases.在人类复杂疾病的遗传风险预测中利用功能注释。

PLoS Comput Biol. 2017 Jun 8;13(6):e1005589. doi: 10.1371/journal.pcbi.1005589. eCollection 2017 Jun.

An Expanded Genome-Wide Association Study of Type 2 Diabetes in Europeans.欧洲人2型糖尿病的全基因组关联研究扩展版

Diabetes. 2017 Nov;66(11):2888-2902. doi: 10.2337/db16-1253. Epub 2017 May 31.

Polygenic scores via penalized regression on summary statistics.基于汇总统计量的惩罚回归多基因评分。

Genet Epidemiol. 2017 Sep;41(6):469-480. doi: 10.1002/gepi.22050. Epub 2017 May 8.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验