National Key Facility for Crop Gene Resources and Genetic Improvement, and Institute of Crop Sciences, Chinese Academy of Agricultural Sciences (CAAS), Beijing 100081, China.
International Maize and Wheat Improvement Center (CIMMYT) China Office, c/o CAAS, 12 Zhongguancun South Street, Beijing 100081, China.
Int J Mol Sci. 2020 Feb 17;21(4):1342. doi: 10.3390/ijms21041342.
Genomic selection (GS) is a strategy to predict the genetic merits of individuals using genome-wide markers. However, GS prediction accuracy is affected by many factors, including missing rate and minor allele frequency (MAF) of genotypic data, GS models, trait features, etc. In this study, we used one wheat population to investigate prediction accuracies of various GS models on yield and yield-related traits from various quality control (QC) scenarios, missing genotype imputation, and genome-wide association studies (GWAS)-derived markers. Missing rate and MAF of single nucleotide polymorphism (SNP) markers were two major factors in QC. Five missing rate levels (0%, 20%, 40%, 60%, and 80%) and three MAF levels (0%, 5%, and 10%) were considered and the five-fold cross validation was used to estimate the prediction accuracy. The results indicated that a moderate missing rate level (20% to 40%) and MAF (5%) threshold provided better prediction accuracy. Under this QC scenario, prediction accuracies were further calculated for imputed and GWAS-derived markers. It was observed that the accuracies of the six traits were related to their heritability and genetic architecture, as well as the GS prediction model. Moore-Penrose generalized inverse (GenInv), ridge regression (RidgeReg), and random forest (RForest) resulted in higher prediction accuracies than other GS models across traits. Imputation of missing genotypic data had marginal effect on prediction accuracy, while GWAS-derived markers improved the prediction accuracy in most cases. These results demonstrate that QC on missing rate and MAF had positive impact on the predictability of GS models. We failed to identify one single combination of QC scenarios that could outperform the others for all traits and GS models. However, the balance between marker number and marker quality is important for the deployment of GS in wheat breeding. GWAS is able to select markers which are mostly related to traits, and therefore can be used to improve the prediction accuracy of GS.
基因组选择(GS)是一种利用全基因组标记预测个体遗传优势的策略。然而,GS 预测准确性受到许多因素的影响,包括基因型数据的缺失率和次要等位基因频率(MAF)、GS 模型、性状特征等。在这项研究中,我们使用一个小麦群体,从不同的质量控制(QC)情景、缺失基因型估计和全基因组关联研究(GWAS)衍生标记的角度,研究了各种 GS 模型对产量和产量相关性状的预测准确性。单核苷酸多态性(SNP)标记的缺失率和 MAF 是 QC 的两个主要因素。考虑了五个缺失率水平(0%、20%、40%、60%和 80%)和三个 MAF 水平(0%、5%和 10%),并使用五倍交叉验证来估计预测准确性。结果表明,中等缺失率水平(20%至 40%)和 MAF(5%)阈值提供了更好的预测准确性。在这种 QC 情景下,进一步计算了估计和 GWAS 衍生标记的预测准确性。结果表明,六个性状的准确性与它们的遗传力和遗传结构以及 GS 预测模型有关。Moore-Penrose 广义逆(GenInv)、岭回归(RidgeReg)和随机森林(RForest)在跨性状的情况下产生了比其他 GS 模型更高的预测准确性。缺失基因型数据的估计对预测准确性有轻微影响,而 GWAS 衍生标记在大多数情况下提高了预测准确性。这些结果表明,对缺失率和 MAF 的 QC 对 GS 模型的可预测性有积极影响。我们未能确定一种单一的 QC 情景组合能够在所有性状和 GS 模型上都表现出色。然而,标记数量和标记质量之间的平衡对于在小麦育种中部署 GS 非常重要。GWAS 能够选择与性状最相关的标记,因此可以用于提高 GS 的预测准确性。