Wang Tingting, Chen Yi-Ping Phoebe, Bowman Phil J, Goddard Michael E, Hayes Ben J
School of Engineering and Mathematical Sciences, La Trobe University, Melbourne, VIC, Australia.
Biosciences Research, Department of Economic Development, Jobs, Transport and Resources, Bundoora, Melbourne, VIC, Australia.
BMC Genomics. 2016 Sep 21;17(1):744. doi: 10.1186/s12864-016-3082-7.
Bayesian mixture models in which the effects of SNP are assumed to come from normal distributions with different variances are attractive for simultaneous genomic prediction and QTL mapping. These models are usually implemented with Monte Carlo Markov Chain (MCMC) sampling, which requires long compute times with large genomic data sets. Here, we present an efficient approach (termed HyB_BR), which is a hybrid of an Expectation-Maximisation algorithm, followed by a limited number of MCMC without the requirement for burn-in.
To test prediction accuracy from HyB_BR, dairy cattle and human disease trait data were used. In the dairy cattle data, there were four quantitative traits (milk volume, protein kg, fat% in milk and fertility) measured in 16,214 cattle from two breeds genotyped for 632,002 SNPs. Validation of genomic predictions was in a subset of cattle either from the reference set or in animals from a third breeds that were not in the reference set. In all cases, HyB_BR gave almost identical accuracies to Bayesian mixture models implemented with full MCMC, however computational time was reduced by up to 1/17 of that required by full MCMC. The SNPs with high posterior probability of a non-zero effect were also very similar between full MCMC and HyB_BR, with several known genes affecting milk production in this category, as well as some novel genes. HyB_BR was also applied to seven human diseases with 4890 individuals genotyped for around 300 K SNPs in a case/control design, from the Welcome Trust Case Control Consortium (WTCCC). In this data set, the results demonstrated again that HyB_BR performed as well as Bayesian mixture models with full MCMC for genomic predictions and genetic architecture inference while reducing the computational time from 45 h with full MCMC to 3 h with HyB_BR.
The results for quantitative traits in cattle and disease in humans demonstrate that HyB_BR can perform equally well as Bayesian mixture models implemented with full MCMC in terms of prediction accuracy, but with up to 17 times faster than the full MCMC implementations. The HyB_BR algorithm makes simultaneous genomic prediction, QTL mapping and inference of genetic architecture feasible in large genomic data sets.
贝叶斯混合模型假定单核苷酸多态性(SNP)的效应来自具有不同方差的正态分布,对于同时进行基因组预测和数量性状基因座(QTL)定位很有吸引力。这些模型通常通过蒙特卡洛马尔可夫链(MCMC)采样来实现,对于大型基因组数据集而言,这需要很长的计算时间。在此,我们提出一种高效方法(称为HyB_BR),它是期望最大化算法的一种混合方法,随后进行有限次数的MCMC,且无需进行预烧。
为了测试HyB_BR的预测准确性,使用了奶牛和人类疾病性状数据。在奶牛数据中,对来自两个品种的16214头奶牛测量了四个数量性状(产奶量、蛋白质千克数、乳脂率和繁殖力),这些奶牛针对632002个SNP进行了基因分型。基因组预测的验证在来自参考集的一部分奶牛中进行,或者在未包含在参考集中的第三个品种的动物中进行。在所有情况下,HyB_BR给出的准确性与使用完整MCMC实现的贝叶斯混合模型几乎相同,然而计算时间减少到完整MCMC所需时间的1/17。在完整MCMC和HyB_BR之间,具有非零效应的高后验概率的SNP也非常相似,在这一类别中有几个影响产奶的已知基因以及一些新基因。HyB_BR还应用于来自威康信托病例对照协会(WTCCC)的病例/对照设计中的七种人类疾病,该设计对4890名个体针对约30万个SNP进行了基因分型。在这个数据集中,结果再次表明,在基因组预测和遗传结构推断方面,HyB_BR的表现与使用完整MCMC的贝叶斯混合模型一样好,同时将计算时间从完整MCMC的45小时减少到HyB_BR的3小时。
奶牛数量性状和人类疾病的结果表明,在预测准确性方面,HyB_BR的表现与使用完整MCMC实现的贝叶斯混合模型一样好,但速度比完整MCMC实现快17倍。HyB_BR算法使得在大型基因组数据集中同时进行基因组预测、QTL定位和遗传结构推断成为可能。