Howard Réka, Carriquiry Alicia L, Beavis William D
Department of Statistics, University of Nebraska, Lincoln, Nebraska 68583
Department of Statistics, Iowa State University, Ames, Iowa 50011.
G3 (Bethesda). 2017 Sep 7;7(9):3103-3113. doi: 10.1534/g3.117.044453.
An epistatic genetic architecture can have a significant impact on prediction accuracies of genomic prediction (GP) methods. Machine learning methods predict traits comprised of epistatic genetic architectures more accurately than statistical methods based on additive mixed linear models. The differences between these types of GP methods suggest a diagnostic for revealing genetic architectures underlying traits of interest. In addition to genetic architecture, the performance of GP methods may be influenced by the sample size of the training population, the number of QTL, and the proportion of phenotypic variability due to genotypic variability (heritability). Possible values for these factors and the number of combinations of the factor levels that influence the performance of GP methods can be large. Thus, efficient methods for identifying combinations of factor levels that produce most accurate GPs is needed. Herein, we employ response surface methods (RSMs) to find the experimental conditions that produce the most accurate GPs. We illustrate RSM with an example of simulated doubled haploid populations and identify the combination of factors that maximize the difference between prediction accuracies of best linear unbiased prediction (BLUP) and support vector machine (SVM) GP methods. The greatest impact on the response is due to the genetic architecture of the population, heritability of the trait, and the sample size. When epistasis is responsible for all of the genotypic variance and heritability is equal to one and the sample size of the training population is large, the advantage of using the SVM method the BLUP method is greatest. However, except for values close to the maximum, most of the response surface shows little difference between the methods. We also determined that the conditions resulting in the greatest prediction accuracy for BLUP occurred when genetic architecture consists solely of additive effects, and heritability is equal to one.
上位性遗传结构可能对基因组预测(GP)方法的预测准确性产生重大影响。机器学习方法比基于加性混合线性模型的统计方法更准确地预测由上位性遗传结构组成的性状。这些类型的GP方法之间的差异表明可以通过诊断来揭示感兴趣性状背后的遗传结构。除了遗传结构外,GP方法的性能可能还会受到训练群体样本量、数量性状位点(QTL)的数量以及基因型变异导致的表型变异比例(遗传力)的影响。这些因素的可能值以及影响GP方法性能的因素水平组合数量可能很大。因此,需要有效的方法来识别能产生最准确基因组预测的因素水平组合。在此,我们采用响应面方法(RSM)来找到能产生最准确基因组预测的实验条件。我们以模拟双单倍体群体为例来说明RSM,并确定能使最佳线性无偏预测(BLUP)和支持向量机(SVM)GP方法的预测准确性差异最大化的因素组合。对响应影响最大的是群体的遗传结构、性状的遗传力和样本量。当上位性作用导致所有基因型方差且遗传力等于1且训练群体样本量很大时,使用SVM方法相对于BLUP方法的优势最大。然而,除了接近最大值的值外,大多数响应面显示这两种方法之间差异不大。我们还确定,当遗传结构仅由加性效应组成且遗传力等于1时,BLUP的预测准确性最高。