INRES-Plant Breeding, Rheinische Friedrich-Wilhelms-Universität Bonn, 53113 Bonn, Germany.
INRES-Plant Nutrition, Rheinische Friedrich-Wilhelms-Universität Bonn, 53113 Bonn, Germany.
Int J Mol Sci. 2023 Sep 19;24(18):14275. doi: 10.3390/ijms241814275.
Estimation and prediction play a key role in breeding programs. Currently, phenotyping of complex traits such as nitrogen use efficiency (NUE) in wheat is still expensive, requires high-throughput technologies and is very time consuming compared to genotyping. Therefore, researchers are trying to predict phenotypes based on marker information. Genetic parameters such as population structure, genomic relationship matrix, marker density and sample size are major factors that increase the performance and accuracy of a model. However, they play an important role in adjusting the statistically significant false discovery rate (FDR) threshold in estimation. In parallel, there are many genetic hyper-parameters that are hidden and not represented in the given genomic selection (GS) model but have significant effects on the results, such as panel size, number of markers, minor allele frequency, number of call rates for each marker, number of cross validations and batch size in the training set of the genomic file. The main challenge is to ensure the reliability and accuracy of predicted breeding values (BVs) as results. Our study has confirmed the results of bias-variance tradeoff and adaptive prediction error for the ensemble-learning-based model STACK, which has the highest performance when estimating genetic parameters and hyper-parameters in a given GS model compared to other models.
估计和预测在育种计划中起着关键作用。目前,与基因分型相比,对小麦等复杂性状(如氮利用效率[NUE])的表型进行测定仍然很昂贵,需要高通量技术且非常耗时。因此,研究人员正试图根据标记信息来预测表型。群体结构、基因组关系矩阵、标记密度和样本大小等遗传参数是提高模型性能和准确性的主要因素。然而,它们在调整估计中统计上显著的错误发现率(FDR)阈值方面起着重要作用。同时,还有许多遗传超参数隐藏在给定的基因组选择(GS)模型中,并未表示出来,但对结果有重大影响,例如面板大小、标记数量、次要等位基因频率、每个标记的调用率数量、交叉验证次数和基因组文件训练集中的批量大小。主要的挑战是确保预测育种值(BV)的可靠性和准确性。我们的研究证实了基于集成学习的模型 STACK 的偏差-方差权衡和自适应预测误差的结果,与其他模型相比,该模型在估计给定 GS 模型中的遗传参数和超参数时具有最高的性能。