Department of Animal Science, University of Nebraska-Lincoln Lincoln, NE, USA.
Department of Animal Sciences, University of Wisconsin-Madison Madison, WI, USA ; Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison Madison, WI, USA ; Department of Dairy Science, University of Wisconsin-Madison Madison, WI, USA.
Front Genet. 2014 Oct 16;5:363. doi: 10.3389/fgene.2014.00363. eCollection 2014.
Prediction of genetic values has been a focus of applied quantitative genetics since the beginning of the 20th century, with renewed interest following the advent of the era of whole genome-enabled prediction. Opportunities offered by the emergence of high-dimensional genomic data fueled by post-Sanger sequencing technologies, especially molecular markers, have driven researchers to extend Ronald Fisher and Sewall Wright's models to confront new challenges. In particular, kernel methods are gaining consideration as a regression method of choice for genome-enabled prediction. Complex traits are presumably influenced by many genomic regions working in concert with others (clearly so when considering pathways), thus generating interactions. Motivated by this view, a growing number of statistical approaches based on kernels attempt to capture non-additive effects, either parametrically or non-parametrically. This review centers on whole-genome regression using kernel methods applied to a wide range of quantitative traits of agricultural importance in animals and plants. We discuss various kernel-based approaches tailored to capturing total genetic variation, with the aim of arriving at an enhanced predictive performance in the light of available genome annotation information. Connections between prediction machines born in animal breeding, statistics, and machine learning are revisited, and their empirical prediction performance is discussed. Overall, while some encouraging results have been obtained with non-parametric kernels, recovering non-additive genetic variation in a validation dataset remains a challenge in quantitative genetics.
自 20 世纪初以来,遗传值的预测一直是应用数量遗传学的焦点,随着全基因组预测时代的到来,人们对其重新产生了兴趣。后桑格测序技术(尤其是分子标记)推动的高维基因组数据的出现带来了机遇,促使研究人员扩展罗纳德·费舍尔(Ronald Fisher)和休厄尔·赖特( Sewall Wright)的模型,以应对新的挑战。特别是核方法作为基因组预测的首选回归方法受到了关注。复杂性状可能受到许多基因组区域的共同影响(当考虑途径时显然如此),从而产生相互作用。受此观点的启发,越来越多的基于核的统计方法试图从参数或非参数角度捕捉非加性效应。本综述主要关注基于核的全基因组回归在动植物中具有农业重要性的广泛数量性状上的应用。我们讨论了各种针对捕捉总遗传变异的基于核的方法,目的是根据可用的基因组注释信息提高预测性能。重新审视了在动物育种、统计学和机器学习领域诞生的预测机之间的联系,并讨论了它们的经验预测性能。总的来说,虽然非参数核已经取得了一些令人鼓舞的结果,但在验证数据集中恢复非加性遗传变异仍然是数量遗传学中的一个挑战。