Facultad de Telemática, Universidad de Colima, Colima 28040, Colima, Mexico.
International Maize and Wheat Improvement Center (CIMMYT), Km 45, Carretera Mexico-Veracruz, Texcoco 52640, Edo. de México, Mexico.
Genes (Basel). 2024 Feb 24;15(3):286. doi: 10.3390/genes15030286.
Genomic selection (GS) is revolutionizing plant breeding. However, its practical implementation is still challenging, since there are many factors that affect its accuracy. For this reason, this research explores data augmentation with the goal of improving its accuracy. Deep neural networks with data augmentation (DA) generate synthetic data from the original training set to increase the training set and to improve the prediction performance of any statistical or machine learning algorithm. There is much empirical evidence of their success in many computer vision applications. Due to this, DA was explored in the context of GS using 14 real datasets. We found empirical evidence that DA is a powerful tool to improve the prediction accuracy, since we improved the prediction accuracy of the top lines in the 14 datasets under study. On average, across datasets and traits, the gain in prediction performance of the DA approach regarding the Conventional method in the top 20% of lines in the testing set was 108.4% in terms of the NRMSE and 107.4% in terms of the MAAPE, but a worse performance was observed on the whole testing set. We encourage more empirical evaluations to support our findings.
基因组选择 (GS) 正在彻底改变植物育种。然而,其实际实施仍然具有挑战性,因为有许多因素会影响其准确性。出于这个原因,本研究探讨了数据扩充,以提高其准确性。具有数据扩充 (DA) 的深度神经网络从原始训练集中生成合成数据,以增加训练集并提高任何统计或机器学习算法的预测性能。有很多经验证据表明它们在许多计算机视觉应用中取得了成功。因此,在 GS 背景下使用 14 个真实数据集探索了 DA。我们发现经验证据表明,DA 是提高预测准确性的有力工具,因为我们提高了所研究的 14 个数据集的前 10 行的预测准确性。平均而言,在数据集和性状方面,与传统方法相比,在测试集中前 20%的线中,DA 方法在 NRMSE 方面的预测性能提高了 108.4%,在 MAAPE 方面提高了 107.4%,但在整个测试集中观察到的性能更差。我们鼓励进行更多的实证评估来支持我们的发现。