贝叶斯套索法正则化参数不同确定方法对基因组预测准确性的影响。

The effect of different approaches to determining the regularization parameter of bayesian LASSO on the accuracy of genomic prediction.

作者信息

Sahebalam Hamid, Gholizadeh Mohsen, Hafezian Seyed Hassan

机构信息

Department of Animal Science, Faculty of Animal and Aquatic Science, Sari Agricultural Sciences and Natural Resources University, Sari, Iran.

出版信息

Mamm Genome. 2025 Mar;36(1):331-345. doi: 10.1007/s00335-024-10088-7. Epub 2024 Dec 11.

DOI:10.1007/s00335-024-10088-7

PMID:39661159

Abstract

Using dense genomic markers opens up new opportunities and challenges for breeding programs. The need to penalize marker-specific regression coefficients becomes particularly important when dense markers are available. Therefore, fitting the marker effects to observations using a regularization technique, such as Bayesian LASSO (BL) regression, is of great interesting. When the Laplace prior distribution is applied to the regression coefficients, BL can be interpreted as a regularization of the norm based on the Bayesian approach. A critical issue is the appropriate selection of hyperparameters values in the prior distributions of regularization techniques, as these values essentially control the sparsity in the estimated model. The purpose of this study was to evaluate different approaches for selecting the regularization parameter in BL, based on fully Bayesian approaches-such as gamma prior (BL_Gamma), beta prior (BL_Beta) and fixed prior (BL_Fixed) as well as data-driven approaches like cross-validation based on mean square error (BL_CV_MSE) and prediction accuracy (BL_CV_PA). Additionally, information-criteria-based methods including Akaike's information criterion (BL_AIC), Bayesian information criterion (BL_BIC) and Deviance information criterion (BL_DIC), were explored. For this purpose, a genome containing eight chromosomes (each 1 Morgan in length) with 100 randomly distributed quantitative trait loci was simulated. The studied scenarios were as follows: Scenario 1 involved 4000 markers and heritability of 0.2, scenario 2 involved 4000 markers and heritability of 0.6, scenario 3 involved 16,000 markers and heritability of 0.2; and scenario 4 involved 16,000 markers and heritability of 0.6. The results showed that among the fully Bayesian and cross-validation approaches, BL_Gamma, BL_Beta, and BL_CV_MSE provided the highest prediction accuracy (PA) in scenario 1 and 3. With increased marker density and heritability (scenario 4), the cross-validation approaches performed slightly better. The information-criteria-based methods demonstrated the lowest PA. Increasing heritability and marker density led to a decrease and an increase in the model penalty on the regression coefficients, respectively. The PA obtained in the target population ranged from 0.210 to 0.413 in Scenario 1, 0.402 to 0.600 in Scenario 2, 0.256 to 0.442 in Scenario 3, and 0.478 to 0.653 in Scenario 4. In generally, fully Bayesian approaches based on random priors for the regularization parameter are recommended for BL, as they provide acceptable PA with lower computational loads.

摘要

使用高密度基因组标记为育种计划带来了新的机遇和挑战。当有高密度标记可用时，对标记特异性回归系数进行惩罚的需求变得尤为重要。因此，使用正则化技术（如贝叶斯最小绝对收缩和选择算子（BL）回归）将标记效应拟合到观测值上非常有趣。当将拉普拉斯先验分布应用于回归系数时，BL可以被解释为基于贝叶斯方法的范数正则化。一个关键问题是正则化技术先验分布中超参数值的适当选择，因为这些值本质上控制了估计模型中的稀疏性。本研究的目的是基于完全贝叶斯方法（如伽马先验（BL_Gamma）、贝塔先验（BL_Beta）和固定先验（BL_Fixed））以及数据驱动方法（如基于均方误差的交叉验证（BL_CV_MSE）和预测准确性（BL_CV_PA））来评估在BL中选择正则化参数的不同方法。此外，还探索了基于信息准则的方法，包括赤池信息准则（BL_AIC）、贝叶斯信息准则（BL_BIC）和离差信息准则（BL_DIC）。为此，模拟了一个包含八条染色体（每条长度为1摩根）且有100个随机分布的数量性状位点的基因组。研究的情景如下：情景1包含4000个标记，遗传力为0.2；情景2包含4000个标记，遗传力为0.6；情景3包含16000个标记，遗传力为0.2；情景4包含16000个标记，遗传力为0.6。结果表明，在完全贝叶斯方法和交叉验证方法中，BL_Gamma、BL_Beta和BL_CV_MSE在情景1和3中提供了最高的预测准确性（PA）。随着标记密度和遗传力的增加（情景4），交叉验证方法的表现略好。基于信息准则的方法显示出最低的PA。遗传力的增加和标记密度的增加分别导致回归系数的模型惩罚减少和增加。在情景1中，目标群体中获得的PA范围为0.210至0.413；情景2中为0.402至0.600；情景3中为0.256至0.442；情景4中为0.478至0.653。一般来说，对于BL，建议基于正则化参数的随机先验的完全贝叶斯方法，因为它们以较低的计算量提供了可接受的PA。

相似文献

The effect of different approaches to determining the regularization parameter of bayesian LASSO on the accuracy of genomic prediction.

Mamm Genome. 2025 Mar;36(1):331-345. doi: 10.1007/s00335-024-10088-7. Epub 2024 Dec 11.

Investigating the Performance of Frequentist and Bayesian Techniques in Genomic Evaluation.

Biochem Genet. 2024 Jul 1. doi: 10.1007/s10528-024-10842-1.

Extended Bayesian LASSO for multiple quantitative trait loci mapping and unobserved phenotype prediction.

Genetics. 2010 Nov;186(3):1067-75. doi: 10.1534/genetics.110.119586. Epub 2010 Aug 30.

Impact of prior specifications in a shrinkage-inducing Bayesian model for quantitative trait mapping and genomic prediction.

Genet Sel Evol. 2013 Jul 8;45(1):24. doi: 10.1186/1297-9686-45-24.

Effects of marker density and population structure on the genomic prediction accuracy for growth trait in Pacific white shrimp Litopenaeus vannamei.

BMC Genet. 2017 May 17;18(1):45. doi: 10.1186/s12863-017-0507-5.

Evaluating key predictors of breast cancer through survival: a comparison of AFT frailty models with LASSO, ridge, and elastic net regularization.

BMC Cancer. 2025 Apr 11;25(1):665. doi: 10.1186/s12885-025-14040-z.

Predicting quantitative traits with regression models for dense molecular markers and pedigree.

Genetics. 2009 May;182(1):375-85. doi: 10.1534/genetics.109.101501. Epub 2009 Mar 16.

Marker genotyping error effects on genomic predictions under different genetic architectures.

Mol Genet Genomics. 2021 Jan;296(1):79-89. doi: 10.1007/s00438-020-01728-z. Epub 2020 Sep 29.

Comparison between genomic predictions using daughter yield deviation and conventional estimated breeding value as response variables.

J Anim Breed Genet. 2010 Dec;127(6):423-32. doi: 10.1111/j.1439-0388.2010.00878.x.

A Multiple-Trait Bayesian Lasso for Genome-Enabled Analysis and Prediction of Complex Traits.

Genetics. 2020 Feb;214(2):305-331. doi: 10.1534/genetics.119.302934. Epub 2019 Dec 26.

引用本文的文献

Predicting Radiation Esophagitis in Patients Undergoing Synchronous Boost Radiotherapy Post-Breast-Conserving Surgery.

Dose Response. 2025 Apr 15;23(2):15593258251335802. doi: 10.1177/15593258251335802. eCollection 2025 Apr-Jun.

本文引用的文献

Investigating the Performance of Frequentist and Bayesian Techniques in Genomic Evaluation.

Biochem Genet. 2024 Jul 1. doi: 10.1007/s10528-024-10842-1.

Smooth-threshold multivariate genetic prediction incorporating gene-environment interactions.

G3 (Bethesda). 2021 Dec 8;11(12). doi: 10.1093/g3journal/jkab278.

Comparison of parametric, semiparametric and nonparametric methods in genomic evaluation.

J Genet. 2019 Nov;98.

Cross-Validation Approaches for Replicability in Psychology.

Front Psychol. 2018 Jul 2;9:1117. doi: 10.3389/fpsyg.2018.01117. eCollection 2018.

Multibreed genomic prediction using multitrait genomic residual maximum likelihood and multitask Bayesian variable selection.

J Dairy Sci. 2018 May;101(5):4279-4294. doi: 10.3168/jds.2017-13366. Epub 2018 Mar 15.

Genomic Selection in Plant Breeding: Methods, Models, and Perspectives.

Trends Plant Sci. 2017 Nov;22(11):961-975. doi: 10.1016/j.tplants.2017.08.011. Epub 2017 Sep 28.

Weighting Strategies for Single-Step Genomic BLUP: An Iterative Approach for Accurate Calculation of GEBV and GWAS.

Front Genet. 2016 Aug 19;7:151. doi: 10.3389/fgene.2016.00151. eCollection 2016.

Genome-wide regression and prediction with the BGLR statistical package.

Genetics. 2014 Oct;198(2):483-95. doi: 10.1534/genetics.114.164442. Epub 2014 Jul 9.

Linkage disequilibrium in finite populations.

Theor Appl Genet. 1968 Jun;38(6):226-31. doi: 10.1007/BF01245622.

Calculating and reporting effect sizes to facilitate cumulative science: a practical primer for t-tests and ANOVAs.

Front Psychol. 2013 Nov 26;4:863. doi: 10.3389/fpsyg.2013.00863.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

贝叶斯套索法正则化参数不同确定方法对基因组预测准确性的影响。

The effect of different approaches to determining the regularization parameter of bayesian LASSO on the accuracy of genomic prediction.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献