Martínez Carlos Alberto, Khare Kshitij, Banerjee Arunava, Elzo Mauricio A
Department of Animal Sciences, University of Florida, Gainesville, FL, USA; Department of Statistics, University of Florida, Gainesville, FL, USA.
Department of Statistics, University of Florida, Gainesville, FL, USA.
J Theor Biol. 2017 Mar 21;417:8-19. doi: 10.1016/j.jtbi.2016.12.020. Epub 2016 Dec 31.
It is important to consider heterogeneity of marker effects and allelic frequencies in across population genome-wide prediction studies. Moreover, all regression models used in genome-wide prediction overlook randomness of genotypes. In this study, a family of hierarchical Bayesian models to perform across population genome-wide prediction modeling genotypes as random variables and allowing population-specific effects for each marker was developed. Models shared a common structure and differed in the priors used and the assumption about residual variances (homogeneous or heterogeneous). Randomness of genotypes was accounted for by deriving the joint probability mass function of marker genotypes conditional on allelic frequencies and pedigree information. As a consequence, these models incorporated kinship and genotypic information that not only permitted to account for heterogeneity of allelic frequencies, but also to include individuals with missing genotypes at some or all loci without the need for previous imputation. This was possible because the non-observed fraction of the design matrix was treated as an unknown model parameter. For each model, a simpler version ignoring population structure, but still accounting for randomness of genotypes was proposed. Implementation of these models and computation of some criteria for model comparison were illustrated using two simulated datasets. Theoretical and computational issues along with possible applications, extensions and refinements were discussed. Some features of the models developed in this study make them promising for genome-wide prediction, the use of information contained in the probability distribution of genotypes is perhaps the most appealing. Further studies to assess the performance of the models proposed here and also to compare them with conventional models used in genome-wide prediction are needed.
在跨群体全基因组预测研究中,考虑标记效应的异质性和等位基因频率非常重要。此外,全基因组预测中使用的所有回归模型都忽略了基因型的随机性。在本研究中,开发了一族层次贝叶斯模型,将基因型作为随机变量进行跨群体全基因组预测建模,并允许每个标记具有群体特异性效应。模型具有共同的结构,但在先验使用和关于残差方差的假设(同质或异质)方面有所不同。通过推导基于等位基因频率和系谱信息的标记基因型的联合概率质量函数来考虑基因型的随机性。因此,这些模型纳入了亲属关系和基因型信息,这不仅允许考虑等位基因频率的异质性,还能纳入在某些或所有位点具有缺失基因型的个体,而无需事先进行插补。这是可能的,因为设计矩阵的未观察部分被视为未知的模型参数。对于每个模型,都提出了一个忽略群体结构但仍考虑基因型随机性的简化版本。使用两个模拟数据集说明了这些模型的实现以及一些用于模型比较的标准的计算。讨论了理论和计算问题以及可能的应用、扩展和改进。本研究中开发的模型的一些特征使其在全基因组预测方面很有前景,利用基因型概率分布中包含的信息可能是最具吸引力的。需要进一步研究来评估这里提出的模型的性能,并将它们与全基因组预测中使用的传统模型进行比较。