Manthena Vamsi, Jarquín Diego, Varshney Rajeev K, Roorkiwal Manish, Dixit Girish Prasad, Bharadwaj Chellapilla, Howard Reka
Department of Statistics, University of Nebraska-Lincoln, Lincoln, NE, United States.
Agronomy Department, University of Florida, Gainesville, FL, United States.
Front Genet. 2022 Oct 14;13:958780. doi: 10.3389/fgene.2022.958780. eCollection 2022.
The development of genomic selection (GS) methods has allowed plant breeding programs to select favorable lines using genomic data before performing field trials. Improvements in genotyping technology have yielded high-dimensional genomic marker data which can be difficult to incorporate into statistical models. In this paper, we investigated the utility of applying dimensionality reduction (DR) methods as a pre-processing step for GS methods. We compared five DR methods and studied the trend in the prediction accuracies of each method as a function of the number of features retained. The effect of DR methods was studied using three models that involved the main effects of line, environment, marker, and the genotype by environment interactions. The methods were applied on a real data set containing 315 lines phenotyped in nine environments with 26,817 markers each. Regardless of the DR method and prediction model used, only a fraction of features was sufficient to achieve maximum correlation. Our results underline the usefulness of DR methods as a key pre-processing step in GS models to improve computational efficiency in the face of ever-increasing size of genomic data.
基因组选择(GS)方法的发展使植物育种计划能够在进行田间试验之前利用基因组数据选择优良品系。基因分型技术的改进产生了高维基因组标记数据,这些数据难以纳入统计模型。在本文中,我们研究了应用降维(DR)方法作为GS方法预处理步骤的效用。我们比较了五种DR方法,并研究了每种方法的预测准确性随保留特征数量的变化趋势。使用涉及品系、环境、标记的主效应以及基因型与环境互作的三种模型研究了DR方法的效果。这些方法应用于一个真实数据集,该数据集包含在九个环境中进行表型分析的315个品系,每个品系有26,817个标记。无论使用何种DR方法和预测模型,只有一小部分特征足以实现最大相关性。我们的结果强调了DR方法作为GS模型中关键预处理步骤的有用性,以在面对不断增加的基因组数据量时提高计算效率。