• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

提高生物库数据中 GWAS 发现和基因组预测准确性。

Improving GWAS discovery and genomic prediction accuracy in biobank data.

机构信息

Scientific Computing and Research Support Unit, University of Lausanne, 1015 Lausanne, Switzerland.

Department of Quantitative Biomedicine, University of Zurich, 8057 Zurich, Switzerland.

出版信息

Proc Natl Acad Sci U S A. 2022 Aug 2;119(31):e2121279119. doi: 10.1073/pnas.2121279119. Epub 2022 Jul 29.

DOI:10.1073/pnas.2121279119
PMID:35905320
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9351350/
Abstract

Genetically informed, deep-phenotyped biobanks are an important research resource and it is imperative that the most powerful, versatile, and efficient analysis approaches are used. Here, we apply our recently developed Bayesian grouped mixture of regressions model (GMRM) in the UK and Estonian Biobanks and obtain the highest genomic prediction accuracy reported to date across 21 heritable traits. When compared to other approaches, GMRM accuracy was greater than annotation prediction models run in the LDAK or LDPred-funct software by 15% (SE 7%) and 14% (SE 2%), respectively, and was 18% (SE 3%) greater than a baseline BayesR model without single-nucleotide polymorphism (SNP) markers grouped into minor allele frequency-linkage disequilibrium (MAF-LD) annotation categories. For height, the prediction accuracy was 47% in a UK Biobank holdout sample, which was 76% of the estimated [Formula: see text]. We then extend our GMRM prediction model to provide mixed-linear model association (MLMA) SNP marker estimates for genome-wide association (GWAS) discovery, which increased the independent loci detected to 16,162 in unrelated UK Biobank individuals, compared to 10,550 from BoltLMM and 10,095 from Regenie, a 62 and 65% increase, respectively. The average [Formula: see text] value of the leading markers increased by 15.24 (SE 0.41) for every 1% increase in prediction accuracy gained over a baseline BayesR model across the traits. Thus, we show that modeling genetic associations accounting for MAF and LD differences among SNP markers, and incorporating prior knowledge of genomic function, is important for both genomic prediction and discovery in large-scale individual-level studies.

摘要

基于遗传信息和深度表型的生物库是一项重要的研究资源,因此必须使用最强大、最通用和最高效的分析方法。在这里,我们在英国和爱沙尼亚生物库中应用了我们最近开发的贝叶斯分组混合回归模型(GMRM),并在 21 个可遗传性状中获得了迄今为止报道的最高基因组预测准确性。与其他方法相比,GMRM 的准确性比在 LDAK 或 LDPred-funct 软件中运行的注释预测模型分别高出 15%(SE 为 7%)和 14%(SE 为 2%),比没有将单核苷酸多态性(SNP)标记分组到次要等位基因频率-连锁不平衡(MAF-LD)注释类别中的基线 BayesR 模型高出 18%(SE 为 3%)。对于身高,在英国生物库的保留样本中的预测准确性为 47%,这是估计值的 76%。然后,我们扩展我们的 GMRM 预测模型,为全基因组关联(GWAS)发现提供混合线性模型关联(MLMA)SNP 标记估计值,这将在无关联的英国生物库个体中检测到的独立基因座增加到 16162 个,而 BoltLMM 和 Regenie 分别为 10550 个和 10095 个,分别增加了 62%和 65%。与基线 BayesR 模型相比,在所有性状中,预测准确性每提高 1%,领先标记的平均[Formula: see text]值就会增加 15.24(SE 为 0.41)。因此,我们表明,针对 SNP 标记的 MAF 和 LD 差异建模遗传关联,并结合基因组功能的先验知识,对于大规模个体水平研究中的基因组预测和发现都很重要。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/de82/9351350/b312bf85d138/pnas.2121279119fig03.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/de82/9351350/b3a7295ed63f/pnas.2121279119fig01.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/de82/9351350/d54bae760efd/pnas.2121279119fig02.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/de82/9351350/b312bf85d138/pnas.2121279119fig03.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/de82/9351350/b3a7295ed63f/pnas.2121279119fig01.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/de82/9351350/d54bae760efd/pnas.2121279119fig02.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/de82/9351350/b312bf85d138/pnas.2121279119fig03.jpg

相似文献

1
Improving GWAS discovery and genomic prediction accuracy in biobank data.提高生物库数据中 GWAS 发现和基因组预测准确性。
Proc Natl Acad Sci U S A. 2022 Aug 2;119(31):e2121279119. doi: 10.1073/pnas.2121279119. Epub 2022 Jul 29.
2
A Multiple-Trait Bayesian Lasso for Genome-Enabled Analysis and Prediction of Complex Traits.用于基于基因组的复杂性状分析与预测的多性状贝叶斯套索法
Genetics. 2020 Feb;214(2):305-331. doi: 10.1534/genetics.119.302934. Epub 2019 Dec 26.
3
Improving Genomic Predictions in Multi-Breed Cattle Populations: A Comparative Analysis of BayesR and GBLUP Models.提高多品种牛群体的基因组预测:贝叶斯 R 和 GBLUP 模型的比较分析。
Genes (Basel). 2024 Feb 18;15(2):253. doi: 10.3390/genes15020253.
4
Genomic Prediction for Grain Yield and Yield-Related Traits in Chinese Winter Wheat.中国冬小麦产量及产量相关性状的基因组预测。
Int J Mol Sci. 2020 Feb 17;21(4):1342. doi: 10.3390/ijms21041342.
5
Exploiting biological priors and sequence variants enhances QTL discovery and genomic prediction of complex traits.利用生物学先验知识和序列变异可增强复杂性状的数量性状基因座发现及基因组预测。
BMC Genomics. 2016 Feb 27;17:144. doi: 10.1186/s12864-016-2443-6.
6
Short communication: Improving the accuracy of genomic prediction of body conformation traits in Chinese Holsteins using markers derived from high-density marker panels.简讯:利用高密度标记面板衍生的标记提高中国荷斯坦牛体型性状基因组预测的准确性。
J Dairy Sci. 2018 Jun;101(6):5250-5254. doi: 10.3168/jds.2017-13456. Epub 2018 Mar 15.
7
Accuracy of prediction of simulated polygenic phenotypes and their underlying quantitative trait loci genotypes using real or imputed whole-genome markers in cattle.利用真实或推算的全基因组标记预测牛模拟多基因表型及其潜在数量性状位点基因型的准确性。
Genet Sel Evol. 2015 Dec 23;47:99. doi: 10.1186/s12711-015-0179-4.
8
Genomic prediction based on selective linkage disequilibrium pruning of low-coverage whole-genome sequence variants in a pure Duroc population.基于在纯杜洛克群体中对低覆盖度全基因组序列变异体进行选择性连锁不平衡修剪的基因组预测。
Genet Sel Evol. 2023 Oct 18;55(1):72. doi: 10.1186/s12711-023-00843-w.
9
Improved genetic prediction of complex traits from individual-level data or summary statistics.从个体水平数据或汇总统计信息中提高复杂性状的遗传预测能力。
Nat Commun. 2021 Jul 7;12(1):4192. doi: 10.1038/s41467-021-24485-y.
10
Accuracy of Genomic Prediction in Synthetic Populations Depending on the Number of Parents, Relatedness, and Ancestral Linkage Disequilibrium.取决于亲本数量、亲缘关系和祖先连锁不平衡的合成群体中基因组预测的准确性。
Genetics. 2017 Jan;205(1):441-454. doi: 10.1534/genetics.116.193243. Epub 2016 Nov 9.

引用本文的文献

1
Separating direct, indirect and parent-of-origin genetic effects in the human population.区分人类群体中的直接、间接和源自亲代的遗传效应。
bioRxiv. 2025 Aug 27:2025.04.28.650988. doi: 10.1101/2025.04.28.650988.
2
Polygenic prediction of body mass index and obesity through the life course and across ancestries.通过生命历程和不同血统对体重指数和肥胖进行多基因预测。
Nat Med. 2025 Jul 21. doi: 10.1038/s41591-025-03827-z.
3
MINE: maximally informative next experiment-toward a new GWAS experimental design and methodology.MINE:迈向新的全基因组关联研究实验设计与方法的最大信息性下一个实验

本文引用的文献

1
A simple new approach to variable selection in regression, with application to genetic fine mapping.一种用于回归中变量选择的简单新方法及其在基因精细定位中的应用。
J R Stat Soc Series B Stat Methodol. 2020 Dec;82(5):1273-1300. doi: 10.1111/rssb.12388. Epub 2020 Jul 10.
2
Probabilistic inference of the genetic architecture underlying functional enrichment of complex traits.复杂性状功能富集相关遗传结构的概率推断。
Nat Commun. 2021 Nov 30;12(1):6972. doi: 10.1038/s41467-021-27258-9.
3
Incorporating functional priors improves polygenic prediction accuracy in UK Biobank and 23andMe data sets.
G3 (Bethesda). 2025 Sep 3;15(9). doi: 10.1093/g3journal/jkaf163.
4
Polygenic risk score prediction accuracy convergence.多基因风险评分预测准确性的收敛性。
HGG Adv. 2025 May 14;6(3):100457. doi: 10.1016/j.xhgg.2025.100457.
5
Evaluation of genomic selection models using whole genome sequence data and functional annotation in Belgian Blue cattle.利用全基因组序列数据和功能注释评估比利时蓝牛的基因组选择模型
Genet Sel Evol. 2025 Mar 4;57(1):10. doi: 10.1186/s12711-025-00955-5.
6
Genetic association studies using disease liabilities from deep neural networks.利用深度神经网络中的疾病易感性进行基因关联研究。
Am J Hum Genet. 2025 Mar 6;112(3):675-692. doi: 10.1016/j.ajhg.2025.01.019. Epub 2025 Feb 21.
7
Quantitative omnigenic model discovers interpretable genome-wide associations.定量全基因组关联模型发现可解释的全基因组关联。
Proc Natl Acad Sci U S A. 2024 Oct 29;121(44):e2402340121. doi: 10.1073/pnas.2402340121. Epub 2024 Oct 23.
8
Genetically predicted dietary intake and risks of colorectal cancer: a Mendelian randomisation study.遗传预测的饮食摄入与结直肠癌风险:一项孟德尔随机化研究。
BMC Cancer. 2024 Sep 17;24(1):1153. doi: 10.1186/s12885-024-12923-1.
9
Evaluation of heritability partitioning approaches in livestock populations.评估家畜群体中遗传力分配方法。
BMC Genomics. 2024 Jul 13;25(1):690. doi: 10.1186/s12864-024-10600-y.
10
Single nucleotide polymorphism SNP19140160 A > C is a potential breeding locus for fast-growth largemouth bass (Micropterus salmoides).单核苷酸多态性 SNP19140160 A>C 是大口黑鲈(Micropterus salmoides)快速生长的潜在育种基因座。
BMC Genomics. 2024 Jan 16;25(1):64. doi: 10.1186/s12864-024-09962-0.
纳入功能先验信息可提高 UK Biobank 和 23andMe 数据集的多基因预测准确性。
Nat Commun. 2021 Oct 18;12(1):6052. doi: 10.1038/s41467-021-25171-9.
4
Improved genetic prediction of complex traits from individual-level data or summary statistics.从个体水平数据或汇总统计信息中提高复杂性状的遗传预测能力。
Nat Commun. 2021 Jul 7;12(1):4192. doi: 10.1038/s41467-021-24485-y.
5
Computationally efficient whole-genome regression for quantitative and binary traits.计算效率高的全基因组回归分析用于定量和二项性状。
Nat Genet. 2021 Jul;53(7):1097-1103. doi: 10.1038/s41588-021-00870-7. Epub 2021 May 20.
6
Functionally informed fine-mapping and polygenic localization of complex trait heritability.功能信息指导的复杂性状遗传力精细映射和多基因定位。
Nat Genet. 2020 Dec;52(12):1355-1363. doi: 10.1038/s41588-020-00735-5. Epub 2020 Nov 16.
7
Evaluating and improving heritability models using summary statistics.使用汇总统计数据评估和改进遗传力模型。
Nat Genet. 2020 Apr;52(4):458-462. doi: 10.1038/s41588-020-0600-y. Epub 2020 Mar 23.
8
A resource-efficient tool for mixed model association analysis of large-scale data.一种资源高效的工具,用于大规模数据的混合模型关联分析。
Nat Genet. 2019 Dec;51(12):1749-1755. doi: 10.1038/s41588-019-0530-8. Epub 2019 Nov 25.
9
Improved polygenic prediction by Bayesian multiple regression on summary statistics.基于汇总统计数据的贝叶斯多元回归提高多基因预测能力。
Nat Commun. 2019 Nov 8;10(1):5086. doi: 10.1038/s41467-019-12653-0.
10
Accurate estimation of SNP-heritability from biobank-scale data irrespective of genetic architecture.从生物库规模数据中准确估计 SNP 遗传力,与遗传结构无关。
Nat Genet. 2019 Aug;51(8):1244-1251. doi: 10.1038/s41588-019-0465-0. Epub 2019 Jul 29.