Chen Chunyu, Steibel Juan P, Tempelman Robert J
Department of Animal Science, Michigan State University, East Lansing, Michigan 48824
Department of Animal Science, Michigan State University, East Lansing, Michigan 48824.
Genetics. 2017 Aug;206(4):1791-1806. doi: 10.1534/genetics.117.202259. Epub 2017 Jun 21.
A currently popular strategy (EMMAX) for genome-wide association (GWA) analysis infers association for the specific marker of interest by treating its effect as fixed while treating all other marker effects as classical Gaussian random effects. It may be more statistically coherent to specify all markers as sharing the same prior distribution, whether that distribution is Gaussian, heavy-tailed (BayesA), or has variable selection specifications based on a mixture of, say, two Gaussian distributions [stochastic search and variable selection (SSVS)]. Furthermore, all such GWA inference should be formally based on posterior probabilities or test statistics as we present here, rather than merely being based on point estimates. We compared these three broad categories of priors within a simulation study to investigate the effects of different degrees of skewness for quantitative trait loci (QTL) effects and numbers of QTL using 43,266 SNP marker genotypes from 922 Duroc-Pietrain F-cross pigs. Genomic regions were based either on single SNP associations, on nonoverlapping windows of various fixed sizes (0.5-3 Mb), or on adaptively determined windows that cluster the genome into blocks based on linkage disequilibrium. We found that SSVS and BayesA lead to the best receiver operating curve properties in almost all cases. We also evaluated approximate maximum (MAP) approaches to BayesA and SSVS as potential computationally feasible alternatives; however, MAP inferences were not promising, particularly due to their sensitivity to starting values. We determined that it is advantageous to use variable selection specifications based on adaptively constructed genomic window lengths for GWA studies.
目前一种流行的全基因组关联(GWA)分析策略(EMMAX),通过将感兴趣的特定标记的效应视为固定值,同时将所有其他标记效应视为经典高斯随机效应,来推断关联。将所有标记指定为共享相同的先验分布可能在统计上更连贯,无论该分布是高斯分布、重尾分布(BayesA),还是基于例如两种高斯分布的混合具有可变选择规范[随机搜索和可变选择(SSVS)]。此外,所有此类GWA推断都应像我们在此展示的那样,正式基于后验概率或检验统计量,而不仅仅基于点估计。我们在一项模拟研究中比较了这三类广泛的先验,以使用来自922头杜洛克 - 皮特兰F1杂交猪的43,266个SNP标记基因型,研究数量性状位点(QTL)效应的不同偏度程度和QTL数量的影响。基因组区域基于单个SNP关联、各种固定大小(0.5 - 3 Mb)的非重叠窗口,或基于根据连锁不平衡将基因组聚类成块的自适应确定窗口。我们发现,在几乎所有情况下,SSVS和BayesA都能产生最佳的接收者操作曲线特性。我们还评估了BayesA和SSVS的近似最大后验概率(MAP)方法,作为潜在的计算上可行的替代方法;然而,MAP推断并不理想,特别是由于它们对起始值敏感。我们确定,在GWA研究中使用基于自适应构建的基因组窗口长度的可变选择规范是有利的。