一种用于基因组预测的惩罚回归方法减少了训练集和测试集之间的不匹配。

A Penalized Regression Method for Genomic Prediction Reduces Mismatch between Training and Testing Sets.

机构信息

Facultad de Telemática, Universidad de Colima, Colima 28040, Mexico.

Centro Universitario de Ciencias Exactas e Ingenierías (CUCEI), Universidad de Guadalajara, Guadalajara 44430, Mexico.

出版信息

Genes (Basel). 2024 Jul 23;15(8):969. doi: 10.3390/genes15080969.

DOI:10.3390/genes15080969

PMID:39202329

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11353568/

Abstract

Genomic selection (GS) is changing plant breeding by significantly reducing the resources needed for phenotyping. However, its accuracy can be compromised by mismatches between training and testing sets, which impact efficiency when the predictive model does not adequately reflect the genetic and environmental conditions of the target population. To address this challenge, this study introduces a straightforward method using binary-Lasso regression to estimate coefficients. In this approach, the response variable assigns 1 to testing set inputs and 0 to training set inputs. Subsequently, Lasso, Ridge, and Elastic Net regression models use the inverse of these coefficients (in absolute values) as weights during training (WLasso, WRidge, and WElastic Net). This weighting method gives less importance to features that discriminate more between training and testing sets. The effectiveness of this method is evaluated across six datasets, demonstrating consistent improvements in terms of the normalized root mean square error. Importantly, the model's implementation is facilitated using the glmnet library, which supports straightforward integration for weighting coefficients.

摘要

基因组选择（GS）正在通过显著减少表型分析所需的资源来改变植物育种。然而，训练集和测试集之间的不匹配会影响预测模型对目标群体遗传和环境条件的反映程度，从而降低其准确性。为了解决这个挑战，本研究引入了一种使用二元 Lasso 回归来估计系数的简单方法。在这种方法中，响应变量将 1 分配给测试集输入，将 0 分配给训练集输入。随后，Lasso、Ridge 和 Elastic Net 回归模型在训练过程中使用这些系数的倒数（绝对值）作为权重（WLasso、WRidge 和 WElastic Net）。这种加权方法对区分训练集和测试集的特征给予较小的权重。该方法在六个数据集上进行了评估，结果表明在归一化均方根误差方面有一致的改进。重要的是，该模型的实现可以使用 glmnet 库来简化，该库支持系数加权的直接集成。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/84bd/11353568/06eb967215ec/genes-15-00969-g001.jpg

相似文献

A Penalized Regression Method for Genomic Prediction Reduces Mismatch between Training and Testing Sets.一种用于基因组预测的惩罚回归方法减少了训练集和测试集之间的不匹配。

Genes (Basel). 2024 Jul 23;15(8):969. doi: 10.3390/genes15080969.

Genomic selection using regularized linear regression models: ridge regression, lasso, elastic net and their extensions.使用正则化线性回归模型的基因组选择：岭回归、套索回归、弹性网络及其扩展。

BMC Proc. 2012 May 21;6 Suppl 2(Suppl 2):S10. doi: 10.1186/1753-6561-6-S2-S10.

Genomic-enabled prediction with classification algorithms.使用分类算法的基因组预测

Heredity (Edinb). 2014 Jun;112(6):616-26. doi: 10.1038/hdy.2013.144. Epub 2014 Jan 15.

The value of early-stage phenotyping for wheat breeding in the age of genomic selection.基因组选择时代小麦早期表型选择的价值。

Theor Appl Genet. 2020 Aug;133(8):2499-2520. doi: 10.1007/s00122-020-03613-0. Epub 2020 Jun 1.

Two simple methods to improve the accuracy of the genomic selection methodology.两种提高基因组选择方法准确性的简单方法。

BMC Genomics. 2023 Apr 26;24(1):220. doi: 10.1186/s12864-023-09294-5.

Accuracy of genomic selection for growth and wood quality traits in two control-pollinated progeny trials using exome capture as the genotyping platform in Norway spruce.利用外显子组捕获作为基因型平台，在两个控制授粉后代试验中对生长和木材质量性状进行基因组选择的准确性：挪威云杉研究。

BMC Genomics. 2018 Dec 18;19(1):946. doi: 10.1186/s12864-018-5256-y.

Training set determination for genomic selection.基因组选择的训练集确定。

Theor Appl Genet. 2019 Oct;132(10):2781-2792. doi: 10.1007/s00122-019-03387-0. Epub 2019 Jul 2.

Maximizing efficiency of genomic selection in CIMMYT's tropical maize breeding program.最大限度地提高 CIMMYT 热带玉米育种计划中基因组选择的效率。

Theor Appl Genet. 2021 Jan;134(1):279-294. doi: 10.1007/s00122-020-03696-9. Epub 2020 Oct 10.

Enhancing winter wheat prediction with genomics, phenomics and environmental data.利用基因组学、表型组学和环境数据提高冬小麦预测能力。

BMC Genomics. 2024 May 31;25(1):544. doi: 10.1186/s12864-024-10438-4.

Early prediction of biomass in hybrid rye based on hyperspectral data surpasses genomic predictability in less-related breeding material.基于高光谱数据的杂交黑麦生物量早期预测优于亲缘关系较远的育种材料的基因组预测能力。

Theor Appl Genet. 2021 May;134(5):1409-1422. doi: 10.1007/s00122-021-03779-1. Epub 2021 Feb 17.

本文引用的文献

Efficient Genomic Prediction of Yield and Dry Matter in Hybrid Potato.杂交马铃薯产量和干物质的高效基因组预测

Plants (Basel). 2023 Jul 11;12(14):2617. doi: 10.3390/plants12142617.

Utilizing evolutionary conservation to detect deleterious mutations and improve genomic prediction in cassava.利用进化保守性检测木薯中的有害突变并改进基因组预测。

Front Plant Sci. 2023 Jan 9;13:1041925. doi: 10.3389/fpls.2022.1041925. eCollection 2022.

Genomic Prediction: Progress and Perspectives for Rice Improvement.基因组预测：水稻改良的进展与展望

Methods Mol Biol. 2022;2467:569-617. doi: 10.1007/978-1-0716-2205-6_21.

Improving root characterisation for genomic prediction in cassava.提高木薯基因组预测中的根系特征描述能力。

Sci Rep. 2020 May 14;10(1):8003. doi: 10.1038/s41598-020-64963-9.

Genomic Prediction of Seed Quality Traits Using Advanced Barley Breeding Lines.利用先进的大麦育种系进行种子质量性状的基因组预测。

PLoS One. 2016 Oct 26;11(10):e0164494. doi: 10.1371/journal.pone.0164494. eCollection 2016.

A Genomic Bayesian Multi-trait and Multi-environment Model.一种基因组贝叶斯多性状多环境模型。

G3 (Bethesda). 2016 Sep 8;6(9):2725-44. doi: 10.1534/g3.116.032359.

Genome-wide association study, genomic prediction and marker-assisted selection for seed weight in soybean (Glycine max).大豆（Glycine max）种子重量的全基因组关联研究、基因组预测及标记辅助选择

Theor Appl Genet. 2016 Jan;129(1):117-30. doi: 10.1007/s00122-015-2614-x. Epub 2015 Oct 30.

Genomic selection and association mapping in rice (Oryza sativa): effect of trait genetic architecture, training population composition, marker number and statistical model on accuracy of rice genomic selection in elite, tropical rice breeding lines.水稻（Oryza sativa）的基因组选择与关联图谱分析：性状遗传结构、训练群体组成、标记数量及统计模型对优质热带水稻育种系基因组选择准确性的影响

PLoS Genet. 2015 Feb 17;11(2):e1004982. doi: 10.1371/journal.pgen.1004982. eCollection 2015 Feb.

Genomic prediction in maize breeding populations with genotyping-by-sequencing.基于测序的基因型鉴定在玉米育种群体中的基因组预测。

G3 (Bethesda). 2013 Nov 6;3(11):1903-26. doi: 10.1534/g3.113.008227.

Whole-genome regression and prediction methods applied to plant and animal breeding.全基因组回归和预测方法在动植物育种中的应用。

Genetics. 2013 Feb;193(2):327-45. doi: 10.1534/genetics.112.143313. Epub 2012 Jun 28.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

一种用于基因组预测的惩罚回归方法减少了训练集和测试集之间的不匹配。

A Penalized Regression Method for Genomic Prediction Reduces Mismatch between Training and Testing Sets.

机构信息

出版信息

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献