Suppr超能文献

一种基于全基因组关联研究的汇总统计数据构建多基因风险模型并纳入外部信息的惩罚回归框架。

A penalized regression framework for building polygenic risk models based on summary statistics from genome-wide association studies and incorporating external information.

作者信息

Chen Ting-Huei, Chatterjee Nilanjan, Landi Maria Teresa, Shi Jianxin

机构信息

Department of Mathematics and Statistics, Regular member, Cervo Brain Research Centre, University of Laval, 1045, av. of Medicine, Suite 1056, Quebec G1V 0A6, Canada.

Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University Baltimore, Maryland, United States of America, 615 N Wolfe Street Baltimore, MD 21205.

出版信息

J Am Stat Assoc. 2021;116(533):133-143. doi: 10.1080/01621459.2020.1764849. Epub 2020 Oct 12.

Abstract

Large-scale genome-wide association (GWAS) studies provide opportunities for developing genetic risk prediction models that have the potential to improve disease prevention, intervention or treatment. The key step is to develop polygenic risk score (PRS) models with high predictive performance for a given disease, which typically requires a large training data set for selecting truly associated single nucleotide polymorphisms (SNPs) and estimating effect sizes accurately. Here, we develop a comprehensive penalized regression for fitting regularized regression models to GWAS summary statistics. We propose incorporating Pleiotropy and ANnotation information into PRS (PANPRS) development through suitable formulation of penalty functions and associated tuning parameters. Extensive simulations show that PANPRS performs equally well or better than existing PRS methods when no functional annotation or pleiotropy is incorporated. When functional annotation data and pleiotropy are informative, PANPRS substantially outperforms existing PRS methods in simulations. Finally, we applied our methods to build PRS for type 2 diabetes and melanoma and found that incorporating relevant functional annotations and GWAS of genetically related traits improved prediction of these two complex diseases.

摘要

大规模全基因组关联(GWAS)研究为开发遗传风险预测模型提供了机会,这些模型有可能改善疾病预防、干预或治疗。关键步骤是为特定疾病开发具有高预测性能的多基因风险评分(PRS)模型,这通常需要大量训练数据集来选择真正相关的单核苷酸多态性(SNP)并准确估计效应大小。在此,我们开发了一种综合惩罚回归方法,用于将正则化回归模型拟合到GWAS汇总统计数据。我们建议通过适当设定惩罚函数和相关调整参数,将多效性和注释信息纳入PRS(PANPRS)的开发过程。大量模拟表明,在不纳入功能注释或多效性时,PANPRS的表现与现有PRS方法相当或更优。当功能注释数据和多效性信息丰富时,PANPRS在模拟中显著优于现有PRS方法。最后,我们应用我们的方法构建2型糖尿病和黑色素瘤的PRS,发现纳入相关功能注释和遗传相关性状的GWAS可改善这两种复杂疾病的预测。

相似文献

引用本文的文献

5
Multi-Task Learning with Summary Statistics.基于汇总统计量的多任务学习
Adv Neural Inf Process Syst. 2023;36:54020-54031. Epub 2024 May 30.

本文引用的文献

10
Polygenic scores via penalized regression on summary statistics.基于汇总统计量的惩罚回归多基因评分。
Genet Epidemiol. 2017 Sep;41(6):469-480. doi: 10.1002/gepi.22050. Epub 2017 May 8.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验