Department of Agricultural Sciences, University of Helsinki, Helsinki FIN-00014.
G3 (Bethesda). 2013 Sep 4;3(9):1511-23. doi: 10.1534/g3.113.007096.
Because of the increased availability of genome-wide sets of molecular markers along with reduced cost of genotyping large samples of individuals, genomic estimated breeding values have become an essential resource in plant and animal breeding. Bayesian methods for breeding value estimation have proven to be accurate and efficient; however, the ever-increasing data sets are placing heavy demands on the parameter estimation algorithms. Although a commendable number of fast estimation algorithms are available for Bayesian models of continuous Gaussian traits, there is a shortage for corresponding models of discrete or censored phenotypes. In this work, we consider a threshold approach of binary, ordinal, and censored Gaussian observations for Bayesian multilocus association models and Bayesian genomic best linear unbiased prediction and present a high-speed generalized expectation maximization algorithm for parameter estimation under these models. We demonstrate our method with simulated and real data. Our example analyses suggest that the use of the extra information present in an ordered categorical or censored Gaussian data set, instead of dichotomizing the data into case-control observations, increases the accuracy of genomic breeding values predicted by Bayesian multilocus association models or by Bayesian genomic best linear unbiased prediction. Furthermore, the example analyses indicate that the correct threshold model is more accurate than the directly used Gaussian model with a censored Gaussian data, while with a binary or an ordinal data the superiority of the threshold model could not be confirmed.
由于全基因组分子标记数据集的可用性增加,以及对大量个体进行基因分型的成本降低,基因组估计育种值已成为植物和动物育种的重要资源。用于育种值估计的贝叶斯方法已被证明是准确和有效的;然而,不断增加的数据量对参数估计算法提出了很高的要求。尽管对于连续高斯性状的贝叶斯模型有相当数量的快速估计算法,但对于离散或截尾表型的相应模型却很少。在这项工作中,我们考虑了二进制、有序和截尾高斯观测的阈值方法,用于贝叶斯多基因座关联模型和贝叶斯基因组最佳线性无偏预测,并提出了一种用于这些模型下参数估计的高速广义期望最大化算法。我们用模拟和真实数据来演示我们的方法。我们的实例分析表明,使用有序分类或截尾高斯数据集的额外信息,而不是将数据二分为病例对照观测值,可以提高贝叶斯多基因座关联模型或贝叶斯基因组最佳线性无偏预测所预测的基因组育种值的准确性。此外,实例分析表明,对于截尾高斯数据,正确的阈值模型比直接使用高斯模型更准确,而对于二进制或有序数据,阈值模型的优越性则无法得到确认。