• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种用于猪表型准确基因组预测的自监督预训练Transformer模型。

A Self-Supervised Pre-Trained Transformer Model for Accurate Genomic Prediction of Swine Phenotypes.

作者信息

Xiang Weixi, Li Zhaoxin, Sun Qixin, Chai Xiujuan, Sun Tan

机构信息

Agricultural Information Institute, Chinese Academy of Agricultural Sciences, Beijing 100081, China.

Key Laboratory of Agricultural Big Data, Ministry of Agriculture and Rural Affairs, Beijing 100081, China.

出版信息

Animals (Basel). 2025 Aug 24;15(17):2485. doi: 10.3390/ani15172485.

DOI:10.3390/ani15172485
PMID:40941280
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12427331/
Abstract

Accurate genomic prediction of complex phenotypes is crucial for accelerating genetic progress in swine breeding. However, conventional methods like Genomic Best Linear Unbiased Prediction (GBLUP) face limitations in capturing complex non-additive effects that contribute significantly to phenotypic variation, restricting the potential accuracy of phenotype prediction. To address this challenge, we introduce a novel framework based on a self-supervised, pre-trained encoder-only Transformer model. Its core novelty lies in tokenizing SNP sequences into non-overlapping 6-mers (sequences of 6 SNPs), enabling the model to directly learn local haplotype patterns instead of treating SNPs as independent markers. The model first undergoes self-supervised pre-training on the unlabeled version of the same SNP dataset used for subsequent fine-tuning, learning intrinsic genomic representations through a masked 6-mer prediction task. Subsequently, the pre-trained model is fine-tuned on labeled data to predict phenotypic values for specific economic traits. Experimental validation demonstrates that our proposed model consistently outperforms baseline methods, including GBLUP and a Transformer of the same architecture trained from scratch (without pre-training), in prediction accuracy across key economic traits. This outperformance suggests the model's capacity to capture non-linear genetic signals missed by linear models. This research contributes not only a new, more accurate methodology for genomic phenotype prediction but also validates the potential of self-supervised learning to decipher complex genomic patterns for direct application in breeding programs. Ultimately, this approach offers a powerful new tool to enhance the rate of genetic gain in swine production by enabling more precise selection based on predicted phenotypes.

摘要

复杂表型的准确基因组预测对于加快猪育种的遗传进展至关重要。然而,像基因组最佳线性无偏预测(GBLUP)这样的传统方法在捕捉对表型变异有重大贡献的复杂非加性效应方面存在局限性,限制了表型预测的潜在准确性。为了应对这一挑战,我们引入了一种基于自监督、仅预训练编码器的Transformer模型的新颖框架。其核心新颖之处在于将单核苷酸多态性(SNP)序列分割为不重叠的6聚体(6个SNP的序列),使模型能够直接学习局部单倍型模式,而不是将SNP视为独立标记。该模型首先在用于后续微调的相同SNP数据集的未标记版本上进行自监督预训练,通过掩蔽6聚体预测任务学习内在基因组表示。随后,对预训练模型在标记数据上进行微调,以预测特定经济性状的表型值。实验验证表明,我们提出的模型在关键经济性状的预测准确性方面始终优于基线方法,包括GBLUP和从零开始训练(无预训练)的相同架构的Transformer。这种优异表现表明该模型能够捕捉线性模型遗漏的非线性遗传信号。这项研究不仅为基因组表型预测贡献了一种新的、更准确的方法,还验证了自监督学习在破译复杂基因组模式以直接应用于育种计划方面的潜力。最终,这种方法提供了一个强大的新工具,通过基于预测表型进行更精确的选择来提高猪生产中的遗传增益率。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/611d/12427331/abfd8164160f/animals-15-02485-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/611d/12427331/fce64f4ba0d3/animals-15-02485-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/611d/12427331/5abcdbd46dd7/animals-15-02485-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/611d/12427331/56bee8bc422f/animals-15-02485-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/611d/12427331/54fc43bdbabc/animals-15-02485-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/611d/12427331/9653cc0aedcd/animals-15-02485-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/611d/12427331/1ba5c18b9f3b/animals-15-02485-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/611d/12427331/abfd8164160f/animals-15-02485-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/611d/12427331/fce64f4ba0d3/animals-15-02485-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/611d/12427331/5abcdbd46dd7/animals-15-02485-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/611d/12427331/56bee8bc422f/animals-15-02485-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/611d/12427331/54fc43bdbabc/animals-15-02485-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/611d/12427331/9653cc0aedcd/animals-15-02485-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/611d/12427331/1ba5c18b9f3b/animals-15-02485-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/611d/12427331/abfd8164160f/animals-15-02485-g007.jpg

相似文献

1
A Self-Supervised Pre-Trained Transformer Model for Accurate Genomic Prediction of Swine Phenotypes.一种用于猪表型准确基因组预测的自监督预训练Transformer模型。
Animals (Basel). 2025 Aug 24;15(17):2485. doi: 10.3390/ani15172485.
2
Trajectory-Ordered Objectives for Self-Supervised Representation Learning of Temporal Healthcare Data Using Transformers: Model Development and Evaluation Study.使用Transformer进行时间序列医疗数据自监督表示学习的轨迹有序目标:模型开发与评估研究
JMIR Med Inform. 2025 Jun 4;13:e68138. doi: 10.2196/68138.
3
A deep learning approach to direct immunofluorescence pattern recognition in autoimmune bullous diseases.深度学习方法在自身免疫性大疱性疾病中的直接免疫荧光模式识别。
Br J Dermatol. 2024 Jul 16;191(2):261-266. doi: 10.1093/bjd/ljae142.
4
Cognitive decline assessment using semantic linguistic content and transformer deep learning architecture.使用语义语言内容和变压器深度学习架构评估认知能力下降。
Int J Lang Commun Disord. 2024 May-Jun;59(3):1110-1127. doi: 10.1111/1460-6984.12973. Epub 2023 Nov 16.
5
Prescription of Controlled Substances: Benefits and Risks管制药品的处方:益处与风险
6
Are Current Survival Prediction Tools Useful When Treating Subsequent Skeletal-related Events From Bone Metastases?当前的生存预测工具在治疗骨转移后的骨骼相关事件时有用吗?
Clin Orthop Relat Res. 2024 Sep 1;482(9):1710-1721. doi: 10.1097/CORR.0000000000003030. Epub 2024 Mar 22.
7
Post-pandemic planning for maternity care for local, regional, and national maternity systems across the four nations: a mixed-methods study.针对四个地区的地方、区域和国家孕产妇保健系统的疫情后规划:一项混合方法研究。
Health Soc Care Deliv Res. 2025 Sep;13(35):1-25. doi: 10.3310/HHTE6611.
8
Enhancing basal cell carcinoma classification in preoperative biopsies via transfer learning with weakly supervised graph transformers.通过使用弱监督图变换器的迁移学习提高术前活检中基底细胞癌的分类
BMC Med Imaging. 2025 May 16;25(1):166. doi: 10.1186/s12880-025-01710-4.
9
Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.两种现代生存预测工具 SORG-MLA 和 METSSS 在接受手术联合放疗和单纯放疗治疗有症状长骨转移患者中的比较。
Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.
10
Cross-generational genomic prediction of Norway spruce (Picea abies) wood properties: an evaluation using independent validation.挪威云杉(Picea abies)木材特性的跨代基因组预测:使用独立验证进行的评估
BMC Genomics. 2025 Jul 21;26(1):680. doi: 10.1186/s12864-025-11861-x.

本文引用的文献

1
Genomic selection in pig breeding: comparative analysis of machine learning algorithms.猪育种中的基因组选择:机器学习算法的比较分析
Genet Sel Evol. 2025 Mar 10;57(1):13. doi: 10.1186/s12711-025-00957-3.
2
Nucleotide Transformer: building and evaluating robust foundation models for human genomics.核苷酸变换器:构建和评估用于人类基因组学的强大基础模型。
Nat Methods. 2025 Feb;22(2):287-297. doi: 10.1038/s41592-024-02523-z. Epub 2024 Nov 28.
3
Machine Learning for the Genomic Prediction of Growth Traits in a Composite Beef Cattle Population.
机器学习用于复合肉牛群体生长性状的基因组预测
Animals (Basel). 2024 Oct 18;14(20):3014. doi: 10.3390/ani14203014.
4
Application of deep learning with bivariate models for genomic prediction of sow lifetime productivity-related traits.深度学习与二元模型在母猪终身生产性能相关性状基因组预测中的应用。
Anim Biosci. 2024 Apr;37(4):622-630. doi: 10.5713/ab.23.0264. Epub 2024 Jan 14.
5
Genomic Predictions in Korean Hanwoo Cows: A Comparative Analysis of Genomic BLUP and Bayesian Methods for Reproductive Traits.韩国韩牛的基因组预测:生殖性状的基因组最佳线性无偏预测(GBLUP)和贝叶斯方法的比较分析
Animals (Basel). 2023 Dec 20;14(1):27. doi: 10.3390/ani14010027.
6
Preselecting Variants from Large-Scale Genome-Wide Association Study Meta-Analyses Increases the Genomic Prediction Accuracy of Growth and Carcass Traits in Large White Pigs.从大规模全基因组关联研究荟萃分析中预先选择变异可提高大白猪生长和胴体性状的基因组预测准确性。
Animals (Basel). 2023 Dec 5;13(24):3746. doi: 10.3390/ani13243746.
7
DNA language models are powerful predictors of genome-wide variant effects.DNA 语言模型是全基因组变异效应的有力预测因子。
Proc Natl Acad Sci U S A. 2023 Oct 31;120(44):e2311219120. doi: 10.1073/pnas.2311219120. Epub 2023 Oct 26.
8
A review of machine learning models applied to genomic prediction in animal breeding.应用于动物育种基因组预测的机器学习模型综述。
Front Genet. 2023 Sep 6;14:1150596. doi: 10.3389/fgene.2023.1150596. eCollection 2023.
9
A self-supervised deep learning method for data-efficient training in genomics.一种用于基因组学中数据高效训练的自监督深度学习方法。
Commun Biol. 2023 Sep 11;6(1):928. doi: 10.1038/s42003-023-05310-2.
10
Transformer Architecture and Attention Mechanisms in Genome Data Analysis: A Comprehensive Review.基因组数据分析中的Transformer架构与注意力机制:全面综述
Biology (Basel). 2023 Jul 22;12(7):1033. doi: 10.3390/biology12071033.