Facultad de Telemática, Universidad de Colima, Colima 28040, Mexico.
Centro Universitario de Ciencias Exactas e Ingenierías (CUCEI), Universidad de Guadalajara, Guadalajara 44430, Mexico.
Genes (Basel). 2024 Jul 23;15(8):969. doi: 10.3390/genes15080969.
Genomic selection (GS) is changing plant breeding by significantly reducing the resources needed for phenotyping. However, its accuracy can be compromised by mismatches between training and testing sets, which impact efficiency when the predictive model does not adequately reflect the genetic and environmental conditions of the target population. To address this challenge, this study introduces a straightforward method using binary-Lasso regression to estimate coefficients. In this approach, the response variable assigns 1 to testing set inputs and 0 to training set inputs. Subsequently, Lasso, Ridge, and Elastic Net regression models use the inverse of these coefficients (in absolute values) as weights during training (WLasso, WRidge, and WElastic Net). This weighting method gives less importance to features that discriminate more between training and testing sets. The effectiveness of this method is evaluated across six datasets, demonstrating consistent improvements in terms of the normalized root mean square error. Importantly, the model's implementation is facilitated using the glmnet library, which supports straightforward integration for weighting coefficients.
基因组选择(GS)正在通过显著减少表型分析所需的资源来改变植物育种。然而,训练集和测试集之间的不匹配会影响预测模型对目标群体遗传和环境条件的反映程度,从而降低其准确性。为了解决这个挑战,本研究引入了一种使用二元 Lasso 回归来估计 系数的简单方法。在这种方法中,响应变量将 1 分配给测试集输入,将 0 分配给训练集输入。随后,Lasso、Ridge 和 Elastic Net 回归模型在训练过程中使用这些 系数的倒数(绝对值)作为权重(WLasso、WRidge 和 WElastic Net)。这种加权方法对区分训练集和测试集的特征给予较小的权重。该方法在六个数据集上进行了评估,结果表明在归一化均方根误差方面有一致的改进。重要的是,该模型的实现可以使用 glmnet 库来简化,该库支持 系数加权的直接集成。