Department of Animal Sciences, University of Wisconsin-Madison, Madison, WI, USA.
Genet Sel Evol. 2013 Jun 13;45(1):17. doi: 10.1186/1297-9686-45-17.
Arguably, genotypes and phenotypes may be linked in functional forms that are not well addressed by the linear additive models that are standard in quantitative genetics. Therefore, developing statistical learning models for predicting phenotypic values from all available molecular information that are capable of capturing complex genetic network architectures is of great importance. Bayesian kernel ridge regression is a non-parametric prediction model proposed for this purpose. Its essence is to create a spatial distance-based relationship matrix called a kernel. Although the set of all single nucleotide polymorphism genotype configurations on which a model is built is finite, past research has mainly used a Gaussian kernel.
We sought to investigate the performance of a diffusion kernel, which was specifically developed to model discrete marker inputs, using Holstein cattle and wheat data. This kernel can be viewed as a discretization of the Gaussian kernel. The predictive ability of the diffusion kernel was similar to that of non-spatial distance-based additive genomic relationship kernels in the Holstein data, but outperformed the latter in the wheat data. However, the difference in performance between the diffusion and Gaussian kernels was negligible.
It is concluded that the ability of a diffusion kernel to capture the total genetic variance is not better than that of a Gaussian kernel, at least for these data. Although the diffusion kernel as a choice of basis function may have potential for use in whole-genome prediction, our results imply that embedding genetic markers into a non-Euclidean metric space has very small impact on prediction. Our results suggest that use of the black box Gaussian kernel is justified, given its connection to the diffusion kernel and its similar predictive performance.
可以说,基因型和表型可能以标准数量遗传学中线性加性模型无法很好解决的功能形式联系在一起。因此,开发能够从所有可用分子信息中预测表型值的统计学习模型,这些模型能够捕捉复杂的遗传网络结构,这一点非常重要。贝叶斯核岭回归是为此目的而提出的一种非参数预测模型。它的本质是创建一个基于空间距离的关系矩阵,称为核。虽然模型构建所基于的所有单核苷酸多态性基因型配置的集合是有限的,但过去的研究主要使用了高斯核。
我们试图使用荷斯坦奶牛和小麦数据来研究扩散核的性能,该核专门用于对离散标记输入进行建模。该核可以看作是高斯核的离散化。在荷斯坦数据中,扩散核的预测能力与基于非空间距离的加性基因组关系核相似,但在小麦数据中表现优于后者。然而,扩散核和高斯核之间的性能差异可以忽略不计。
可以得出结论,扩散核捕获总遗传方差的能力并不优于高斯核,至少对于这些数据是这样。尽管扩散核作为基函数的选择可能具有用于全基因组预测的潜力,但我们的结果表明,将遗传标记嵌入非欧几里得度量空间对预测的影响很小。鉴于其与扩散核的联系及其相似的预测性能,我们的结果表明,使用黑盒高斯核是合理的。