Liu Yun-Hua, Xu Yang, Zhang Meiping, Cui Yanru, Sze Sing-Hoi, Smith C Wayne, Xu Shizhong, Zhang Hong-Bin
Department of Soil and Crop Sciences, Texas A&M University, College Station, TX, United States.
Botany and Plant Sciences, University of California, Riverside, Riverside, CA, United States.
Front Plant Sci. 2020 Nov 9;11:583277. doi: 10.3389/fpls.2020.583277. eCollection 2020.
Accurate phenotype prediction of quantitative traits is paramount to enhanced plant research and breeding. Here, we report the accurate prediction of cotton fiber length, a typical quantitative trait, using 474 cotton ( ssp.) fiber length () genes and nine prediction models. When the SNPs/InDels contained in 226 of the genes or the expressions of all 474 genes was used for fiber length prediction, a prediction accuracy of = 0.83 was obtained, approaching the maximally possible prediction accuracy of a quantitative trait. This has improved by 116%, the prediction accuracies of the fiber length thus far achieved for genomic selection using genome-wide random DNA markers. Moreover, analysis of the genes identified 125 of the genes that are key to accurate prediction of fiber length, with which a prediction accuracy similar to that of all 474 genes was obtained. The fiber lengths of the plants predicted with expressions of the 125 key genes were significantly correlated with those predicted with the SNPs/InDels of the above 226 SNP/InDel-containing genes ( = 0.892, = 0.000). The prediction accuracies of fiber length using both genic datasets were highly consistent across environments or generations. Finally, we found that a training population consisting of 100-120 plants was sufficient to train a model for accurate prediction of a quantitative trait using the genes controlling the trait. Therefore, the genes controlling a quantitative trait are capable of accurately predicting its phenotype, thereby dramatically improving the ability, accuracy, and efficiency of phenotype prediction and promoting gene-based breeding in cotton and other species.
准确预测数量性状的表型对于加强植物研究和育种至关重要。在此,我们报告了利用474个棉花(品种)纤维长度(相关)基因和9种预测模型对典型数量性状棉花纤维长度进行的准确预测。当使用这些基因中226个基因所含的单核苷酸多态性/插入缺失(SNPs/InDels)或所有474个基因的表达量进行纤维长度预测时,获得了决定系数(R²)=0.83的预测准确率,接近数量性状可能达到的最大预测准确率。这比迄今为止使用全基因组随机DNA标记进行基因组选择所获得的纤维长度预测准确率提高了116%。此外,对这些基因的分析确定了其中125个基因是准确预测纤维长度的关键基因,利用这些基因获得了与所有474个基因相似的预测准确率。用这125个关键基因的表达量预测的植株纤维长度与用上述含有226个SNPs/InDels的基因的SNPs/InDels预测的纤维长度显著相关(R = 0.892,P = 0.000)。在不同环境或世代中,使用这两个基因数据集预测纤维长度的准确率高度一致。最后,我们发现由100 - 120株植物组成的训练群体足以训练一个使用控制数量性状的基因来准确预测该数量性状的模型。因此,控制数量性状的基因能够准确预测其表型,从而显著提高表型预测的能力、准确性和效率,并促进棉花及其他物种基于基因的育种。