Suppr超能文献

提高全贝叶斯推断的计算效率并评估全基因组预测模型中超参数误设的影响。

Improving the computational efficiency of fully Bayes inference and assessing the effect of misspecification of hyperparameters in whole-genome prediction models.

作者信息

Yang Wenzhao, Chen Chunyu, Tempelman Robert J

机构信息

Department of Animal Science, Michigan State University, East Lansing, MI, 48824-1225, USA.

出版信息

Genet Sel Evol. 2015 Mar 7;47(1):13. doi: 10.1186/s12711-015-0092-x.

Abstract

BACKGROUND

The reliability of whole-genome prediction models (WGP) based on using high-density single nucleotide polymorphism (SNP) panels critically depends on proper specification of key hyperparameters. A currently popular WGP model labeled BayesB specifies a hyperparameter π, that is `loosely used to describe the proportion of SNPs that are in linkage disequilibrium (LD) with causal variants. The remaining markers are specified to be random draws from a Student t distribution with key hyperparameters being degrees of freedom v and scale s(2).

METHODS

We consider three alternative Markov chain Monte Carlo (MCMC) approaches based on the use of Metropolis-Hastings (MH) to estimate these key hyperparameters. The first approach, termed DFMH, is based on a previously published strategy for which s(2) is drawn by a Gibbs step and v is drawn by a MH step. The second strategy, termed UNIMH, substitutes MH for Gibbs when drawing s(2) and further collapses or marginalizes the full conditional density of v. The third strategy, termed BIVMH, is based on jointly drawing the two hyperparameters in a bivariate MH step. We also tested the effect of misspecification of s(2) for its effect on accuracy of genomic estimated breeding values (GEBV), yet allowing for inference on the other hyperparameters.

RESULTS

The UNIMH and BIVMH strategies had significantly greater (P < 0.05) computational efficiencies for estimating v and s(2) than DFMH in BayesA (π = 1) and BayesB implementations. We drew similar conclusions based on an analysis of the public domain heterogeneous stock mice data. We also determined significant drops (P < 0.01) in accuracies of GEBV under BayesA by overspecifying s(2), whereas BayesB was more robust to such misspecifications. However, understating s(2) was compensated by counterbalancing inferences on v in BayesA and BayesB, and on π in BayesB.

CONCLUSIONS

Sampling strategies based solely on MH updates of v and s(2), and collapsed representations of full conditional densities can improve the computational efficiency of MCMC relative to the use of Gibbs updates. We believe that proper inferences on s(2), v and π are vital to ensure that the accuracy of GEBV is maximized when using parametric WGP models.

摘要

背景

基于高密度单核苷酸多态性(SNP)面板的全基因组预测模型(WGP)的可靠性严重依赖于关键超参数的正确设定。当前一种流行的WGP模型BayesB设定了一个超参数π,它“大致用于描述与因果变异处于连锁不平衡(LD)状态的SNP的比例。其余标记被指定为从自由度为v和尺度为s(2)的学生t分布中随机抽取。

方法

我们考虑基于使用Metropolis-Hastings(MH)的三种替代马尔可夫链蒙特卡罗(MCMC)方法来估计这些关键超参数。第一种方法称为DFMH,基于先前发表的一种策略,其中s(2)通过吉布斯步长抽取,v通过MH步长抽取。第二种策略称为UNIMH,在抽取s(2)时用MH替代吉布斯,并进一步压缩或边缘化v的完全条件密度。第三种策略称为BIVMH,基于在双变量MH步长中联合抽取这两个超参数。我们还测试了s(2)设定错误对基因组估计育种值(GEBV)准确性的影响,同时允许对其他超参数进行推断。

结果

在BayesA(π = 1)和BayesB实现中,UNIMH和BIVMH策略在估计v和s(2)方面的计算效率比DFMH显著更高(P < 0.05)。基于对公共领域异质种群小鼠数据的分析,我们得出了类似的结论。我们还确定,在BayesA中过度指定s(2)会导致GEBV准确性显著下降(P < 0.01),而BayesB对这种设定错误更具稳健性。然而,在BayesA和BayesB中低估s(2)可通过对v的平衡推断以及在BayesB中对π的推断来弥补。

结论

相对于使用吉布斯更新,仅基于v和s(2)的MH更新以及完全条件密度的压缩表示的抽样策略可以提高MCMC的计算效率。我们认为,对s(2)、v和π进行正确推断对于确保在使用参数化WGP模型时最大化GEBV的准确性至关重要。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f846/4351701/d7adbcfd0a6c/12711_2015_92_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验