基于个体水平数据和多人群汇总统计信息的基因组预测。

Genomic Prediction Using Individual-Level Data and Summary Statistics from Multiple Populations.

机构信息

Wageningen University and Research, Animal Breeding and Genomics, 6700 AH, The Netherlands

Wageningen University and Research, Animal Breeding and Genomics, 6700 AH, The Netherlands.

出版信息

Genetics. 2018 Sep;210(1):53-69. doi: 10.1534/genetics.118.301109. Epub 2018 Jul 18.

DOI:10.1534/genetics.118.301109

PMID:30021793

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6116972/

Abstract

This study presents a method for genomic prediction that uses individual-level data and summary statistics from multiple populations. Genome-wide markers are nowadays widely used to predict complex traits, and genomic prediction using multi-population data are an appealing approach to achieve higher prediction accuracies. However, sharing of individual-level data across populations is not always possible. We present a method that enables integration of summary statistics from separate analyses with the available individual-level data. The data can either consist of individuals with single or multiple (weighted) phenotype records per individual. We developed a method based on a hypothetical joint analysis model and absorption of population-specific information. We show that population-specific information is fully captured by estimated allele substitution effects and the accuracy of those estimates, , the summary statistics. The method gives identical result as the joint analysis of all individual-level data when complete summary statistics are available. We provide a series of easy-to-use approximations that can be used when complete summary statistics are not available or impractical to share. Simulations show that approximations enable integration of different sources of information across a wide range of settings, yielding accurate predictions. The method can be readily extended to multiple-traits. In summary, the developed method enables integration of genome-wide data in the individual-level or summary statistics from multiple populations to obtain more accurate estimates of allele substitution effects and genomic predictions.

摘要

本研究提出了一种利用个体水平数据和多个群体汇总统计信息进行基因组预测的方法。全基因组标记现在广泛用于预测复杂性状，利用多群体数据进行基因组预测是实现更高预测准确性的一种有吸引力的方法。然而，跨群体共享个体水平数据并非总是可行的。我们提出了一种方法，能够将单独分析的汇总统计信息与可用的个体水平数据整合在一起。数据可以由每个个体具有单个或多个（加权）表型记录的个体组成。我们开发了一种基于假设的联合分析模型和吸收群体特有信息的方法。我们表明，通过估计的等位基因替换效应和这些估计的准确性（即汇总统计信息）可以完全捕获群体特有信息。当完整的汇总统计信息可用时，该方法给出与所有个体水平数据的联合分析相同的结果。当完整的汇总统计信息不可用或不便于共享时，我们提供了一系列易于使用的近似值。模拟表明，这些近似值可以在广泛的设置中整合不同来源的信息，从而产生准确的预测。该方法可以很容易地扩展到多性状。总之，所开发的方法能够整合个体水平或多个群体的汇总统计信息中的全基因组数据，以获得更准确的等位基因替换效应估计值和基因组预测值。

相似文献

Genomic Prediction Using Individual-Level Data and Summary Statistics from Multiple Populations.基于个体水平数据和多人群汇总统计信息的基因组预测。

Genetics. 2018 Sep;210(1):53-69. doi: 10.1534/genetics.118.301109. Epub 2018 Jul 18.

Empirical and deterministic accuracies of across-population genomic prediction.跨群体基因组预测的经验性和确定性准确性。

Genet Sel Evol. 2015 Feb 6;47(1):5. doi: 10.1186/s12711-014-0086-0.

An Equation to Predict the Accuracy of Genomic Values by Combining Data from Multiple Traits, Populations, or Environments.一种通过整合多个性状、群体或环境的数据来预测基因组值准确性的方程。

Genetics. 2016 Feb;202(2):799-823. doi: 10.1534/genetics.115.183269. Epub 2015 Dec 4.

Reliability of genomic prediction for milk fatty acid composition by using a multi-population reference and incorporating GWAS results.利用多群体参考和整合 GWAS 结果进行牛奶脂肪酸成分的基因组预测的可靠性。

Genet Sel Evol. 2019 Apr 27;51(1):16. doi: 10.1186/s12711-019-0460-z.

Impact of QTL properties on the accuracy of multi-breed genomic prediction.数量性状基因座特性对多品种基因组预测准确性的影响。

Genet Sel Evol. 2015 May 8;47(1):42. doi: 10.1186/s12711-015-0124-6.

Accuracy of prediction of simulated polygenic phenotypes and their underlying quantitative trait loci genotypes using real or imputed whole-genome markers in cattle.利用真实或推算的全基因组标记预测牛模拟多基因表型及其潜在数量性状位点基因型的准确性。

Genet Sel Evol. 2015 Dec 23;47:99. doi: 10.1186/s12711-015-0179-4.

Accounting for trait architecture in genomic predictions of US Holstein cattle using a weighted realized relationship matrix.利用加权实现关系矩阵对美国荷斯坦奶牛的基因组预测进行性状结构分析。

Genet Sel Evol. 2015 Apr 2;47(1):24. doi: 10.1186/s12711-015-0100-1.

Genomic prediction of complex human traits: relatedness, trait architecture and predictive meta-models.复杂人类性状的基因组预测：亲缘关系、性状结构和预测性元模型。

Hum Mol Genet. 2015 Jul 15;24(14):4167-82. doi: 10.1093/hmg/ddv145. Epub 2015 Apr 26.

Bayesian neural networks with variable selection for prediction of genotypic values.基于变量选择的贝叶斯神经网络用于预测基因型值。

Genet Sel Evol. 2020 May 15;52(1):26. doi: 10.1186/s12711-020-00544-8.

Combining cow and bull reference populations to increase accuracy of genomic prediction and genome-wide association studies.结合母牛和公牛参考群体以提高基因组预测和全基因组关联研究的准确性。

J Dairy Sci. 2013 Oct;96(10):6703-15. doi: 10.3168/jds.2012-6013. Epub 2013 Jul 25.

引用本文的文献

Multibreed genomic prediction using summary statistics and a breed-origin-of-alleles approach.使用汇总统计数据和等位基因起源方法进行多品种基因组预测。

Heredity (Edinb). 2023 Jul;131(1):33-42. doi: 10.1038/s41437-023-00619-4. Epub 2023 May 25.

International single-step SNPBLUP beef cattle evaluations for Limousin weaning weight.国际单步 SNPBLUP 肉牛评估利木赞断奶体重。

Genet Sel Evol. 2022 Sep 4;54(1):57. doi: 10.1186/s12711-022-00748-0.

MetaGS: an accurate method to impute and combine SNP effects across populations using summary statistics.MetaGS：一种使用汇总统计数据在人群中准确推断和组合 SNP 效应的方法。

Genet Sel Evol. 2022 Jun 2;54(1):37. doi: 10.1186/s12711-022-00725-7.

Genomic prediction of cotton fibre quality and yield traits using Bayesian regression methods.利用贝叶斯回归方法对棉花纤维品质和产量性状进行基因组预测。

Heredity (Edinb). 2022 Aug;129(2):103-112. doi: 10.1038/s41437-022-00537-x. Epub 2022 May 6.

Genomic Breeding Programs Realize Larger Benefits by Cooperation in the Presence of Genotype × Environment Interaction Than Conventional Breeding Programs.在存在基因型×环境互作的情况下，基因组育种计划通过合作比传统育种计划实现更大的效益。

Front Genet. 2020 Apr 21;11:251. doi: 10.3389/fgene.2020.00251. eCollection 2020.

ICGRM: integrative construction of genomic relationship matrix combining multiple genomic regions for big dataset.ICGRM：整合多个基因组区域构建基因组关系矩阵的综合方法，用于大数据集。

BMC Bioinformatics. 2019 Dec 26;20(1):731. doi: 10.1186/s12859-019-3319-y.

Genomic Selection and Use of Molecular Tools in Breeding Programs for Indigenous and Crossbred Cattle in Developing Countries: Current Status and Future Prospects.发展中国家本地牛和杂交牛育种计划中的基因组选择与分子工具应用：现状与未来展望

Front Genet. 2019 Jan 9;9:694. doi: 10.3389/fgene.2018.00694. eCollection 2018.

本文引用的文献

Improving genetic prediction by leveraging genetic correlations among human diseases and traits.通过利用人类疾病和特征之间的遗传相关性来提高遗传预测。

Nat Commun. 2018 Mar 7;9(1):989. doi: 10.1038/s41467-017-02769-6.

Estimated allele substitution effects underlying genomic evaluation models depend on the scaling of allele counts.基因组评估模型潜在的估计等位基因替代效应取决于等位基因计数的缩放。

Genet Sel Evol. 2017 Oct 30;49(1):79. doi: 10.1186/s12711-017-0355-9.

AlphaSim: Software for Breeding Program Simulation.AlphaSim：种畜培育程序模拟软件。

Plant Genome. 2016 Nov;9(3). doi: 10.3835/plantgenome2016.02.0013.

National single-step genomic method that integrates multi-national genomic information.整合多国基因组信息的国家单步基因组方法。

J Dairy Sci. 2017 Jan;100(1):465-478. doi: 10.3168/jds.2016-11733. Epub 2016 Nov 17.

Dissecting the genetics of complex traits using summary association statistics.利用汇总关联统计剖析复杂性状的遗传学。

Nat Rev Genet. 2017 Feb;18(2):117-127. doi: 10.1038/nrg.2016.142. Epub 2016 Nov 14.

Genomic Selection in Multi-environment Crop Trials.多环境作物试验中的基因组选择

G3 (Bethesda). 2016 May 3;6(5):1313-26. doi: 10.1534/g3.116.027524.

Genetics. 2016 Feb;202(2):799-823. doi: 10.1534/genetics.115.183269. Epub 2015 Dec 4.

Multiple-trait- and selection indices-genomic predictions for grain yield and protein content in rye for feeding purposes.用于饲料用途的黑麦籽粒产量和蛋白质含量的多性状及选择指数-基因组预测。

Theor Appl Genet. 2016 Feb;129(2):273-87. doi: 10.1007/s00122-015-2626-6. Epub 2015 Nov 3.

Integration of external estimated breeding values and associated reliabilities using correlations among traits and effects.利用性状和效应之间的相关性整合外部估计育种值及相关可靠性。

J Dairy Sci. 2015 Dec;98(12):9044-50. doi: 10.3168/jds.2015-9894.

Modeling Linkage Disequilibrium Increases Accuracy of Polygenic Risk Scores.连锁不平衡建模提高了多基因风险评分的准确性。

Am J Hum Genet. 2015 Oct 1;97(4):576-92. doi: 10.1016/j.ajhg.2015.09.001.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验