Suppr超能文献

通过机器学习提高荣昌猪繁殖性状的基因组预测准确性

Enhancing Genomic Prediction Accuracy of Reproduction Traits in Rongchang Pigs Through Machine Learning.

作者信息

Wang Junge, Chai Jie, Chen Li, Zhang Tinghuan, Long Xi, Diao Shuqi, Chen Dong, Guo Zongyi, Tang Guoqing, Wu Pingxian

机构信息

Farm Animal Genetic Resources Exploration and Innovation Key Laboratory of Sichuan Province, Sichuan Agricultural University, Chengdu 611130, China.

Chongqing Academy of Animal Sciences, Chongqing 402460, China.

出版信息

Animals (Basel). 2025 Feb 12;15(4):525. doi: 10.3390/ani15040525.

Abstract

The increasing volume of genome sequencing data presents challenges for traditional genome-wide prediction methods in handling large datasets. Machine learning (ML) techniques, which can process high-dimensional data, offer promising solutions. This study aimed to find a genome-wide prediction method for local pig breeds, using 10 datasets with varying SNP densities derived from imputed sequencing data of 515 Rongchang pigs and the Pig QTL database. Three reproduction traits-litter weight, total number of piglets born, and number of piglets born alive-were predicted using six traditional methods and five ML methods, including kernel ridge regression, random forest, Gradient Boosting Decision Tree (GBDT), Light Gradient Boosting Machine, and Adaboost. The methods' efficacy was evaluated using fivefold cross-validation and independent tests. The predictive performance of both traditional and ML methods initially increased with SNP density, peaking at 800-900 k SNPs. ML methods outperformed traditional ones, showing improvements of 0.4-4.1%. The integration of GWAS and the Pig QTL database enhanced ML robustness. ML models exhibited superior generalizability, with high correlation coefficients (0.935-0.998) between cross-validation and independent test results. GBDT and random forest showed high computational efficiency, making them promising methods for genomic prediction in livestock breeding.

摘要

基因组测序数据量的不断增加,给传统全基因组预测方法处理大型数据集带来了挑战。能够处理高维数据的机器学习(ML)技术提供了有前景的解决方案。本研究旨在为地方猪种找到一种全基因组预测方法,使用了来自515头荣昌猪的推算测序数据和猪QTL数据库的10个具有不同SNP密度的数据集。使用六种传统方法和五种ML方法预测了三个繁殖性状——窝重、产仔总数和活产仔数,这五种ML方法包括核岭回归、随机森林、梯度提升决策树(GBDT)、轻量级梯度提升机和Adaboost。使用五折交叉验证和独立测试评估了这些方法的有效性。传统方法和ML方法的预测性能最初都随着SNP密度的增加而提高,在800 - 900 k个SNP时达到峰值。ML方法优于传统方法,提高了0.4 - 4.1%。全基因组关联研究(GWAS)和猪QTL数据库的整合增强了ML的稳健性。ML模型表现出卓越的泛化能力,交叉验证和独立测试结果之间具有较高的相关系数(0.935 - 0.998)。GBDT和随机森林显示出较高的计算效率,使其成为家畜育种中基因组预测的有前景的方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c46b/11852217/23b8c45ee485/animals-15-00525-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验