De los Campos Gustavo, Gianola Daniel, Rosa Guilherme J M, Weigel Kent A, Crossa José
University of Wisconsin-Madison, 1675 Observatory Drive, WI 53706, USA.
Genet Res (Camb). 2010 Aug;92(4):295-308. doi: 10.1017/S0016672310000285.
Prediction of genetic values is a central problem in quantitative genetics. Over many decades, such predictions have been successfully accomplished using information on phenotypic records and family structure usually represented with a pedigree. Dense molecular markers are now available in the genome of humans, plants and animals, and this information can be used to enhance the prediction of genetic values. However, the incorporation of dense molecular marker data into models poses many statistical and computational challenges, such as how models can cope with the genetic complexity of multi-factorial traits and with the curse of dimensionality that arises when the number of markers exceeds the number of data points. Reproducing kernel Hilbert spaces regressions can be used to address some of these challenges. The methodology allows regressions on almost any type of prediction sets (covariates, graphs, strings, images, etc.) and has important computational advantages relative to many parametric approaches. Moreover, some parametric models appear as special cases. This article provides an overview of the methodology, a discussion of the problem of kernel choice with a focus on genetic applications, algorithms for kernel selection and an assessment of the proposed methods using a collection of 599 wheat lines evaluated for grain yield in four mega environments.
遗传值预测是数量遗传学中的核心问题。几十年来,利用通常以系谱表示的表型记录和家系结构信息,此类预测已成功实现。如今,人类、植物和动物基因组中都有高密度分子标记,这些信息可用于加强遗传值预测。然而,将高密度分子标记数据纳入模型会带来许多统计和计算方面的挑战,比如模型如何应对多基因性状的遗传复杂性以及当标记数量超过数据点数量时出现的维数灾难。再生核希尔伯特空间回归可用于应对其中一些挑战。该方法允许对几乎任何类型的预测集(协变量、图、字符串、图像等)进行回归,并且相对于许多参数方法具有重要的计算优势。此外,一些参数模型是其特殊情况。本文概述了该方法,讨论了以遗传应用为重点的核选择问题、核选择算法,并使用在四个大环境中对599个小麦品系的籽粒产量进行评估的数据,对所提出的方法进行了评估。