线性模型和梯度提升机在外交小鼠复杂表型上的预测性能。

Prediction performance of linear models and gradient boosting machine on complex phenotypes in outbred mice.

机构信息

Hendrix Genetics B.V., Research and Technology Center (RTC), 5830 AC Boxmeer, The Netherlands.

The Jackson Laboratory, Bar Harbor, ME 04609, USA.

出版信息

G3 (Bethesda). 2022 Apr 4;12(4). doi: 10.1093/g3journal/jkac039.

DOI:10.1093/g3journal/jkac039

PMID:35166767

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8982369/

Abstract

We compared the performance of linear (GBLUP, BayesB, and elastic net) methods to a nonparametric tree-based ensemble (gradient boosting machine) method for genomic prediction of complex traits in mice. The dataset used contained genotypes for 50,112 SNP markers and phenotypes for 835 animals from 6 generations. Traits analyzed were bone mineral density, body weight at 10, 15, and 20 weeks, fat percentage, circulating cholesterol, glucose, insulin, triglycerides, and urine creatinine. The youngest generation was used as a validation subset, and predictions were based on all older generations. Model performance was evaluated by comparing predictions for animals in the validation subset against their adjusted phenotypes. Linear models outperformed gradient boosting machine for 7 out of 10 traits. For bone mineral density, cholesterol, and glucose, the gradient boosting machine model showed better prediction accuracy and lower relative root mean squared error than the linear models. Interestingly, for these 3 traits, there is evidence of a relevant portion of phenotypic variance being explained by epistatic effects. Using a subset of top markers selected from a gradient boosting machine model helped for some of the traits to improve the accuracy of prediction when these were fitted into linear and gradient boosting machine models. Our results indicate that gradient boosting machine is more strongly affected by data size and decreased connectedness between reference and validation sets than the linear models. Although the linear models outperformed gradient boosting machine for the polygenic traits, our results suggest that gradient boosting machine is a competitive method to predict complex traits with assumed epistatic effects.

摘要

我们比较了线性方法（GBLUP、BayesB 和弹性网络）和基于非参数树的集成方法（梯度提升机）在预测小鼠复杂性状中的表现。使用的数据集包含 50112 个 SNP 标记的基因型和 6 个世代的 835 只动物的表型。分析的性状包括骨密度、10、15 和 20 周体重、体脂肪百分比、循环胆固醇、葡萄糖、胰岛素、甘油三酯和尿肌酐。最年轻的一代被用作验证子集，预测基于所有较老的世代。通过将验证子集中动物的预测与它们的调整表型进行比较来评估模型性能。线性模型在 10 个特征中的 7 个方面优于梯度提升机。对于骨密度、胆固醇和葡萄糖，梯度提升机模型显示出比线性模型更好的预测准确性和更低的相对均方根误差。有趣的是，对于这 3 个特征，有证据表明表型方差的一个相关部分是由上位效应解释的。使用从梯度提升机模型中选择的最佳标记子集的一部分，当这些标记被拟合到线性和梯度提升机模型中时，有助于提高一些性状的预测准确性。我们的结果表明，梯度提升机比线性模型更受数据大小和参考集与验证集之间连通性降低的影响。虽然线性模型在多基因性状上优于梯度提升机，但我们的结果表明，梯度提升机是一种具有假设上位效应的预测复杂性状的竞争方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/902f/8982369/395ea1f6d53e/jkac039f1.jpg

相似文献

Prediction performance of linear models and gradient boosting machine on complex phenotypes in outbred mice.线性模型和梯度提升机在外交小鼠复杂表型上的预测性能。

G3 (Bethesda). 2022 Apr 4;12(4). doi: 10.1093/g3journal/jkac039.

Genomic prediction of blood biomarkers of metabolic disorders in Holstein cattle using parametric and nonparametric models.利用参数和非参数模型对荷斯坦奶牛代谢紊乱血液生物标志物进行基因组预测。

Genet Sel Evol. 2024 Apr 29;56(1):31. doi: 10.1186/s12711-024-00903-9.

Adding gene transcripts into genomic prediction improves accuracy and reveals sampling time dependence.添加基因转录本可提高基因组预测的准确性，并揭示采样时间的依赖性。

G3 (Bethesda). 2022 Nov 4;12(11). doi: 10.1093/g3journal/jkac258.

Deep learning versus parametric and ensemble methods for genomic prediction of complex phenotypes.深度学习与参数化和集成方法在复杂表型基因组预测中的比较。

Genet Sel Evol. 2020 Feb 24;52(1):12. doi: 10.1186/s12711-020-00531-z.

Evaluating the performance of machine learning methods and variable selection methods for predicting difficult-to-measure traits in Holstein dairy cattle using milk infrared spectral data.利用牛奶近红外光谱数据评估机器学习方法和变量选择方法在荷斯坦奶牛中预测难以测量性状的性能。

J Dairy Sci. 2021 Jul;104(7):8107-8121. doi: 10.3168/jds.2020-19861. Epub 2021 Apr 15.

Prediction of Maize Phenotypic Traits With Genomic and Environmental Predictors Using Gradient Boosting Frameworks.使用梯度提升框架，通过基因组和环境预测因子预测玉米表型性状

Front Plant Sci. 2021 Nov 11;12:699589. doi: 10.3389/fpls.2021.699589. eCollection 2021.

Can Deep Learning Improve Genomic Prediction of Complex Human Traits?深度学习能否提高复杂人类性状的基因组预测？

Genetics. 2018 Nov;210(3):809-819. doi: 10.1534/genetics.118.301298. Epub 2018 Aug 31.

Benchmarking Parametric and Machine Learning Models for Genomic Prediction of Complex Traits.基于参数和机器学习模型的复杂性状基因组预测的基准测试。

G3 (Bethesda). 2019 Nov 5;9(11):3691-3702. doi: 10.1534/g3.119.400498.

Machine learning approaches for the prediction of bone mineral density by using genomic and phenotypic data of 5130 older men.利用 5130 名老年男性的基因组和表型数据进行骨密度预测的机器学习方法。

Sci Rep. 2021 Feb 24;11(1):4482. doi: 10.1038/s41598-021-83828-3.

Use of a Bayesian model including QTL markers increases prediction reliability when test animals are distant from the reference population.当测验动物与参考群体相距较远时，使用包含 QTL 标记的贝叶斯模型可以提高预测的可靠性。

J Dairy Sci. 2019 Aug;102(8):7237-7247. doi: 10.3168/jds.2018-15815. Epub 2019 May 31.

引用本文的文献

Data-driven frameworks to robustly predict solubility parameter of diverse polymers.用于稳健预测多种聚合物溶解度参数的数据驱动框架。

Sci Rep. 2025 Aug 25;15(1):31157. doi: 10.1038/s41598-025-12758-1.

Incorporating information of causal variants in genomic prediction using GBLUP or machine learning models in a simulated livestock population.在一个模拟的家畜群体中，使用GBLUP或机器学习模型将因果变异信息纳入基因组预测。

J Anim Sci Biotechnol. 2025 Aug 19;16(1):118. doi: 10.1186/s40104-025-01250-5.

Parametric optimization of the slot waveguide characteristics using a machine-learning approach.使用机器学习方法对槽波导特性进行参数优化。

Sci Rep. 2025 Jul 5;15(1):24037. doi: 10.1038/s41598-025-07521-5.

A manganese metabolism-related gene signature stratifies prognosis and immunotherapy efficacy in kidney cancer.一种与锰代谢相关的基因特征可对肾癌的预后和免疫治疗疗效进行分层。

Discov Oncol. 2025 Jul 1;16(1):1242. doi: 10.1007/s12672-025-03050-9.

Front Med (Lausanne). 2025 May 30;12:1577203. doi: 10.3389/fmed.2025.1577203. eCollection 2025.

Clinical prediction of pathological complete response in breast cancer: a machine learning study.乳腺癌病理完全缓解的临床预测：一项机器学习研究

BMC Cancer. 2025 May 23;25(1):933. doi: 10.1186/s12885-025-14335-1.

Factors influencing recurrence and model development for recurrence of minimally invasive percutaneous transhepatic lithotripsy: a single-center retrospective study.影响微创经皮肝穿刺碎石术后复发及复发模型建立的因素：一项单中心回顾性研究

Am J Transl Res. 2024 May 15;16(5):1740-1748. doi: 10.62347/TVRY9827. eCollection 2024.

Genet Sel Evol. 2024 Apr 29;56(1):31. doi: 10.1186/s12711-024-00903-9.

Maximizing efficiency in sunflower breeding through historical data optimization.通过历史数据优化实现向日葵育种效率最大化。

Plant Methods. 2024 Mar 16;20(1):42. doi: 10.1186/s13007-024-01151-0.

Prediction Model of Ocular Metastases in Gastric Adenocarcinoma: Machine Learning-Based Development and Interpretation Study.胃癌眼部转移的预测模型：基于机器学习的开发和解释研究。

Technol Cancer Res Treat. 2024 Jan-Dec;23:15330338231219352. doi: 10.1177/15330338231219352.

本文引用的文献

Prediction of Hanwoo Cattle Phenotypes from Genotypes Using Machine Learning Methods.使用机器学习方法从基因型预测韩牛表型

Animals (Basel). 2021 Jul 11;11(7):2066. doi: 10.3390/ani11072066.

Heritability of fat distributions in male mice from the founder strains of the Diversity Outbred mouse population.多样性远交系小鼠群体中雄性小鼠脂肪分布的遗传力。

G3 (Bethesda). 2021 May 7;11(5). doi: 10.1093/g3journal/jkab079.

A review of deep learning applications for genomic selection.深度学习在基因组选择中的应用综述。

BMC Genomics. 2021 Jan 6;22(1):19. doi: 10.1186/s12864-020-07319-x.

Machine learning in plant science and plant breeding.植物科学与植物育种中的机器学习

iScience. 2020 Dec 5;24(1):101890. doi: 10.1016/j.isci.2020.101890. eCollection 2021 Jan 22.

Genome-Wide Association Study in 3,173 Outbred Rats Identifies Multiple Loci for Body Weight, Adiposity, and Fasting Glucose.在 3173 只杂交大鼠中进行的全基因组关联研究确定了多个与体重、体脂肪和空腹血糖相关的基因座。

Obesity (Silver Spring). 2020 Oct;28(10):1964-1973. doi: 10.1002/oby.22927. Epub 2020 Aug 29.

An evaluation of machine-learning for predicting phenotype: studies in yeast, rice, and wheat.用于预测表型的机器学习评估：酵母、水稻和小麦的研究

Mach Learn. 2020;109(2):251-277. doi: 10.1007/s10994-019-05848-5. Epub 2019 Oct 23.

Exploring Deep Learning for Complex Trait Genomic Prediction in Polyploid Outcrossing Species.探索深度学习用于多倍体异交物种复杂性状的基因组预测

Front Plant Sci. 2020 Feb 6;11:25. doi: 10.3389/fpls.2020.00025. eCollection 2020.

Deep learning versus parametric and ensemble methods for genomic prediction of complex phenotypes.深度学习与参数化和集成方法在复杂表型基因组预测中的比较。

Genet Sel Evol. 2020 Feb 24;52(1):12. doi: 10.1186/s12711-020-00531-z.

A review of traditional and machine learning methods applied to animal breeding.对应用于动物育种的传统方法和机器学习方法的综述。

Anim Health Res Rev. 2019 Jun;20(1):31-46. doi: 10.1017/S1466252319000148.

Benchmarking Parametric and Machine Learning Models for Genomic Prediction of Complex Traits.基于参数和机器学习模型的复杂性状基因组预测的基准测试。

G3 (Bethesda). 2019 Nov 5;9(11):3691-3702. doi: 10.1534/g3.119.400498.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

线性模型和梯度提升机在外交小鼠复杂表型上的预测性能。

Prediction performance of linear models and gradient boosting machine on complex phenotypes in outbred mice.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献