Zhang Hao, Bao Shiqian, Zhao Xiaona, Bai Yangfan, Lv Yangcheng, Gao Pengfei, Li Fuzhong, Zhang Wuping
School of Software Technology, Shanxi Agricultural University, Jinzhong 030801, China.
Animals (Basel). 2024 Nov 21;14(23):3348. doi: 10.3390/ani14233348.
In a study involving 385 Large White pigs, a genome-wide association study (GWAS) was conducted to investigate reproductive traits, specifically the number of healthy litters (NHs) and the number of weaned litters (NWs). Several SNP loci, including ALGA0098819, ALGA0037969, and H3GA0032302, were significantly associated with these traits. In the combined-parity analysis, candidate genes, such as , , , and , were identified. GO and KEGG pathway enrichment analyses revealed that these genes are involved in key biological processes, including organic synthesis, the regulation of sperm activity, spermatogenesis, and meiosis. In the by-parity analysis, the gene was significantly associated with the NW trait in the second and fourth parities, while , , and were linked to cell proliferation, DNA repair, and metabolism, suggesting their potential role in regulating reproductive traits. These findings provide new molecular markers for the genetic study of reproductive traits in Large White pigs. For the phenotypic prediction of NH and NW traits, several machine learning models (GBDT, RF, LightGBM, and Adaboost.R2), as well as traditional models (GBLUP, BRR, and BL), were evaluated using SNP data in varying proportions. After PCA processing, the GBDT model achieved the highest PCC for NH (0.141), while LightGBM reached the highest PCC for NW (0.146). The MAE, MSE, and RMSE results showed that the traditional models exhibited stable error rates, while the machine learning models performed comparatively better across the different SNP ratios. Overall, PCA processing provided some improvement in the predictive performance of all of the models, though the overall increase in accuracy was limited.
在一项涉及385头大白猪的研究中,进行了全基因组关联研究(GWAS)以调查繁殖性状,特别是健康仔猪数(NHs)和断奶仔猪数(NWs)。包括ALGA0098819、ALGA0037969和H3GA0032302在内的几个单核苷酸多态性(SNP)位点与这些性状显著相关。在合并胎次分析中,鉴定出了诸如 、 、 和 等候选基因。基因本体(GO)和京都基因与基因组百科全书(KEGG)通路富集分析表明,这些基因参与关键生物学过程,包括有机合成、精子活力调节、精子发生和减数分裂。在按胎次分析中, 基因在第二和第四胎次与NW性状显著相关,而 、 和 与细胞增殖、DNA修复和代谢相关,表明它们在调节繁殖性状方面的潜在作用。这些发现为大白猪繁殖性状的遗传研究提供了新的分子标记。对于NH和NW性状的表型预测,使用不同比例的SNP数据评估了几种机器学习模型(梯度提升决策树(GBDT)、随机森林(RF)、轻量级梯度提升机(LightGBM)和自适应增强回归树(Adaboost.R2))以及传统模型(基因组最佳线性无偏预测(GBLUP)、贝叶斯RR(BRR)和贝叶斯线性(BL))。经过主成分分析(PCA)处理后,GBDT模型在NH方面达到最高的皮尔逊相关系数(PCC)(0.141),而LightGBM在NW方面达到最高的PCC(0.146)。平均绝对误差(MAE)、均方误差(MSE)和均方根误差(RMSE)结果表明,传统模型表现出稳定的错误率,而机器学习模型在不同SNP比例下表现相对更好。总体而言,PCA处理在所有模型的预测性能上提供了一定改进,尽管准确性的总体提升有限。