多品种基因组预测使用多性状基因组残差极大似然法和多任务贝叶斯变量选择。

Multibreed genomic prediction using multitrait genomic residual maximum likelihood and multitask Bayesian variable selection.

机构信息

Wageningen University & Research, Animal Breeding and Genomics, PO Box 338, 6700 AH Wageningen, the Netherlands.

Faculty of Veterinary and Agricultural Science, University of Melbourne, Melbourne, Victoria 3010, Australia; Agriculture Research, Department of Economic Development, Jobs, Transport and Resources, Melbourne, Victoria 3083, Australia.

出版信息

J Dairy Sci. 2018 May;101(5):4279-4294. doi: 10.3168/jds.2017-13366. Epub 2018 Mar 15.

DOI:10.3168/jds.2017-13366

PMID:29550121

Abstract

Genomic prediction is applicable to individuals of different breeds. Empirical results to date, however, show limited benefits in using information on multiple breeds in the context of genomic prediction. We investigated a multitask Bayesian model, presented previously by others, implemented in a Bayesian stochastic search variable selection (BSSVS) model. This model allowed for evidence of quantitative trait loci (QTL) to be accumulated across breeds or for both QTL that segregate across breeds and breed-specific QTL. In both cases, single nucleotide polymorphism effects were estimated with information from a single breed. Other models considered were a single-trait and multitrait genomic residual maximum likelihood (GREML) model, with breeds considered as different traits, and a single-trait BSSVS model. All single-trait models were applied to each of the 2 breeds separately and to the pooled data of both breeds. The data used included a training data set of 6,278 Holstein and 722 Jersey bulls, as well as 374 Jersey validation bulls. All animals had genotypes for 474,773 single nucleotide polymorphisms after editing and phenotypes for milk, fat, and protein yields. Using the same training data, BSSVS consistently outperformed GREML. The multitask BSSVS, however, did not outperform single-trait BSSVS, which used pooled Holstein and Jersey data for training. Thus, the rigorous assumption that the traits are the same in both breeds yielded a slightly better prediction than a model that had to estimate the correlation between the breeds from the data. Adding the Holstein data significantly increased the accuracy of the single-trait GREML and BSSVS in predicting the Jerseys for milk and protein, in line with estimated correlations between the breeds of 0.66 and 0.47 for milk and protein yields, whereas only the BSSVS model significantly improved the accuracy for fat yield with an estimated correlation between breeds of only 0.05. The relatively high genetic correlations for milk and protein yields, and the superiority of the pooling strategy, is likely the result of the observed admixture between both breeds in our data. The Bayesian model was able to detect several QTL in Holsteins, which likely enabled it to outperform GREML. The inability of the multitask Bayesian models to outperform a simple pooling strategy may be explained by the fact that the pooling strategy assumes equal effects in both breeds; furthermore, this assumption may be valid for moderate- to large-sized QTL, which are important for multibreed genomic prediction.

摘要

基因组预测适用于不同品种的个体。然而，迄今为止的经验结果表明，在基因组预测的背景下，利用多个品种的信息所带来的益处有限。我们研究了一种多任务贝叶斯模型，该模型由其他人提出，并在贝叶斯随机搜索变量选择 (BSSVS) 模型中实现。该模型允许在品种间积累数量性状基因座 (QTL) 的证据，或者同时积累在品种间分离的 QTL 和品种特异性 QTL。在这两种情况下，单核苷酸多态性效应都是利用单个品种的信息来估计的。其他考虑的模型是单性状和多性状基因组残差最大似然 (GREML) 模型，其中品种被视为不同的性状，以及单性状 BSSVS 模型。所有单性状模型都分别应用于这两个品种的每一个，以及这两个品种的混合数据。使用的数据集包括一个由 6278 头荷斯坦牛和 722 头泽西牛公牛组成的训练数据集，以及 374 头泽西牛验证公牛。所有动物在经过编辑后都有 474773 个单核苷酸多态性的基因型和牛奶、脂肪和蛋白质产量的表型。使用相同的训练数据，BSSVS 始终优于 GREML。然而，多任务 BSSVS 并没有优于使用荷斯坦和泽西牛混合数据进行训练的单性状 BSSVS。因此，严格假设两个品种的性状相同，比从数据中估计品种间相关性的模型产生的预测略好。添加荷斯坦数据显著提高了单性状 GREML 和 BSSVS 对泽西牛牛奶和蛋白质产量的预测准确性，与品种间估计的 0.66 和 0.47 的相关性一致，而只有 BSSVS 模型显著提高了脂肪产量的准确性，品种间的相关性仅为 0.05。牛奶和蛋白质产量的遗传相关性较高，以及混合策略的优越性，可能是由于我们的数据中观察到两个品种之间的混合。贝叶斯模型能够检测到荷斯坦牛中的几个 QTL，这可能使其优于 GREML。多任务贝叶斯模型无法优于简单的混合策略，这可能是因为混合策略假设两个品种的效应相同；此外，对于中到大的 QTL，这种假设可能是有效的，因为它们对多品种基因组预测很重要。