Wageningen University and Research, Animal Breeding and Genomics, 6700 AH, The Netherlands
Wageningen University and Research, Animal Breeding and Genomics, 6700 AH, The Netherlands.
Genetics. 2018 Sep;210(1):53-69. doi: 10.1534/genetics.118.301109. Epub 2018 Jul 18.
This study presents a method for genomic prediction that uses individual-level data and summary statistics from multiple populations. Genome-wide markers are nowadays widely used to predict complex traits, and genomic prediction using multi-population data are an appealing approach to achieve higher prediction accuracies. However, sharing of individual-level data across populations is not always possible. We present a method that enables integration of summary statistics from separate analyses with the available individual-level data. The data can either consist of individuals with single or multiple (weighted) phenotype records per individual. We developed a method based on a hypothetical joint analysis model and absorption of population-specific information. We show that population-specific information is fully captured by estimated allele substitution effects and the accuracy of those estimates, , the summary statistics. The method gives identical result as the joint analysis of all individual-level data when complete summary statistics are available. We provide a series of easy-to-use approximations that can be used when complete summary statistics are not available or impractical to share. Simulations show that approximations enable integration of different sources of information across a wide range of settings, yielding accurate predictions. The method can be readily extended to multiple-traits. In summary, the developed method enables integration of genome-wide data in the individual-level or summary statistics from multiple populations to obtain more accurate estimates of allele substitution effects and genomic predictions.
本研究提出了一种利用个体水平数据和多个群体汇总统计信息进行基因组预测的方法。全基因组标记现在广泛用于预测复杂性状,利用多群体数据进行基因组预测是实现更高预测准确性的一种有吸引力的方法。然而,跨群体共享个体水平数据并非总是可行的。我们提出了一种方法,能够将单独分析的汇总统计信息与可用的个体水平数据整合在一起。数据可以由每个个体具有单个或多个(加权)表型记录的个体组成。我们开发了一种基于假设的联合分析模型和吸收群体特有信息的方法。我们表明,通过估计的等位基因替换效应和这些估计的准确性(即汇总统计信息)可以完全捕获群体特有信息。当完整的汇总统计信息可用时,该方法给出与所有个体水平数据的联合分析相同的结果。当完整的汇总统计信息不可用或不便于共享时,我们提供了一系列易于使用的近似值。模拟表明,这些近似值可以在广泛的设置中整合不同来源的信息,从而产生准确的预测。该方法可以很容易地扩展到多性状。总之,所开发的方法能够整合个体水平或多个群体的汇总统计信息中的全基因组数据,以获得更准确的等位基因替换效应估计值和基因组预测值。