• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用基因组基础模型增强 DNA 序列的个性化基因表达预测。

Enhancing personalized gene expression prediction from DNA sequences using genomic foundation models.

机构信息

Division of Biostatistics and Health Data Science, University of Minnesota, Minneapolis, Minneapolis, MN, USA.

Division of Biostatistics and Health Data Science, University of Minnesota, Minneapolis, Minneapolis, MN, USA.

出版信息

HGG Adv. 2024 Oct 10;5(4):100347. doi: 10.1016/j.xhgg.2024.100347. Epub 2024 Aug 27.

DOI:10.1016/j.xhgg.2024.100347
PMID:39205391
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11416237/
Abstract

Artificial intelligence (AI)/deep learning (DL) models that predict molecular phenotypes like gene expression directly from DNA sequences have recently emerged. While these models have proven effective at capturing the variation across genes, their ability to explain inter-individual differences has been limited. We hypothesize that the performance gap can be narrowed through the use of pre-trained embeddings from the Nucleotide Transformer, a large foundation model trained on 3,000+ genomes. We train a transformer model using the pre-trained embeddings and compare its predictive performance to Enformer, the current state-of-the-art model, using genotype and expression data from 290 individuals. Our model significantly outperforms Enformer in terms of correlation across individuals, and narrows the performance gap with an elastic net regression approach that uses just the genetic variants as predictors. Although simple regression models have their advantages in personalized prediction tasks, DL approaches based on foundation models pre-trained on diverse genomes have unique strengths in flexibility and interpretability. With further methodological and computational improvements with more training data, these models may eventually predict molecular phenotypes from DNA sequences with an accuracy surpassing that of regression-based approaches. Our work demonstrates the potential for large pre-trained AI/DL models to advance functional genomics.

摘要

人工智能(AI)/深度学习(DL)模型最近已经出现,可以直接从 DNA 序列预测基因表达等分子表型。虽然这些模型在捕捉基因间的变异方面已被证明是有效的,但它们解释个体间差异的能力有限。我们假设可以通过使用来自 Nucleotide Transformer 的预训练嵌入来缩小性能差距,Nucleotide Transformer 是一个在 3000 多个基因组上训练的大型基础模型。我们使用预训练的嵌入来训练一个变压器模型,并使用 290 个人的基因型和表达数据将其预测性能与当前最先进的模型 Enformer 进行比较。我们的模型在个体间的相关性方面明显优于 Enformer,并通过仅使用遗传变异作为预测因子的弹性网络回归方法缩小了性能差距。虽然简单的回归模型在个性化预测任务中有其优势,但基于在不同基因组上预训练的基础模型的 DL 方法在灵活性和可解释性方面具有独特的优势。随着更多训练数据的进一步方法学和计算改进,这些模型最终可能会以超过基于回归的方法的准确性从 DNA 序列预测分子表型。我们的工作表明,大型预训练 AI/DL 模型有可能推进功能基因组学。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7bad/11416237/a04392febf9f/gr16.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7bad/11416237/d22cfeaadc26/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7bad/11416237/bb9fed210b64/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7bad/11416237/fbad4a777f82/gr3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7bad/11416237/78f216ce4927/gr4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7bad/11416237/eea6e1dac88a/gr5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7bad/11416237/5c231b6fa37c/gr6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7bad/11416237/e175d9b58ae2/gr7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7bad/11416237/ac25b8f19a76/gr8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7bad/11416237/42855a318da4/gr9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7bad/11416237/b18e1c8e24f3/gr10.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7bad/11416237/d5c097a5c17a/gr11.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7bad/11416237/3dcd4adbb3a4/gr12.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7bad/11416237/fe306d521ba9/gr13.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7bad/11416237/913262e4c95d/gr14.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7bad/11416237/84e6464414a1/gr15.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7bad/11416237/a04392febf9f/gr16.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7bad/11416237/d22cfeaadc26/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7bad/11416237/bb9fed210b64/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7bad/11416237/fbad4a777f82/gr3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7bad/11416237/78f216ce4927/gr4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7bad/11416237/eea6e1dac88a/gr5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7bad/11416237/5c231b6fa37c/gr6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7bad/11416237/e175d9b58ae2/gr7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7bad/11416237/ac25b8f19a76/gr8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7bad/11416237/42855a318da4/gr9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7bad/11416237/b18e1c8e24f3/gr10.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7bad/11416237/d5c097a5c17a/gr11.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7bad/11416237/3dcd4adbb3a4/gr12.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7bad/11416237/fe306d521ba9/gr13.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7bad/11416237/913262e4c95d/gr14.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7bad/11416237/84e6464414a1/gr15.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7bad/11416237/a04392febf9f/gr16.jpg

相似文献

1
Enhancing personalized gene expression prediction from DNA sequences using genomic foundation models.利用基因组基础模型增强 DNA 序列的个性化基因表达预测。
HGG Adv. 2024 Oct 10;5(4):100347. doi: 10.1016/j.xhgg.2024.100347. Epub 2024 Aug 27.
2
Leveraging a foundation model zoo for cell similarity search in oncological microscopy across devices.利用基础模型库进行跨设备肿瘤显微镜检查中的细胞相似性搜索。
Front Oncol. 2025 Jun 18;15:1480384. doi: 10.3389/fonc.2025.1480384. eCollection 2025.
3
A deep learning approach to direct immunofluorescence pattern recognition in autoimmune bullous diseases.深度学习方法在自身免疫性大疱性疾病中的直接免疫荧光模式识别。
Br J Dermatol. 2024 Jul 16;191(2):261-266. doi: 10.1093/bjd/ljae142.
4
Cost-effectiveness of using prognostic information to select women with breast cancer for adjuvant systemic therapy.利用预后信息为乳腺癌患者选择辅助性全身治疗的成本效益
Health Technol Assess. 2006 Sep;10(34):iii-iv, ix-xi, 1-204. doi: 10.3310/hta10340.
5
Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.两种现代生存预测工具 SORG-MLA 和 METSSS 在接受手术联合放疗和单纯放疗治疗有症状长骨转移患者中的比较。
Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.
6
Can a Liquid Biopsy Detect Circulating Tumor DNA With Low-passage Whole-genome Sequencing in Patients With a Sarcoma? A Pilot Evaluation.液体活检能否通过低深度全基因组测序检测肉瘤患者的循环肿瘤DNA?一项初步评估。
Clin Orthop Relat Res. 2025 Jan 1;483(1):39-48. doi: 10.1097/CORR.0000000000003161. Epub 2024 Jun 21.
7
Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中,如果患者出现以下症状和体征,可判断其是否患有 COVID-19。
Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.
8
The Use of AI for Phenotype-Genotype Mapping.人工智能在表型-基因型映射中的应用。
Methods Mol Biol. 2025;2952:369-410. doi: 10.1007/978-1-0716-4690-8_21.
9
Short-Term Memory Impairment短期记忆障碍
10
Gaps in Artificial Intelligence Research for Rural Health in the United States: A Scoping Review.美国农村卫生人工智能研究的差距:一项范围综述
medRxiv. 2025 Jun 27:2025.06.26.25330361. doi: 10.1101/2025.06.26.25330361.

引用本文的文献

1
Pre-training Genomic Language Model with Variants for Better Modeling Functional Genomics.使用变异体预训练基因组语言模型以更好地建模功能基因组学。
bioRxiv. 2025 Aug 23:2025.02.26.640468. doi: 10.1101/2025.02.26.640468.
2
Foundation models and intelligent decision-making: Progress, challenges, and perspectives.基础模型与智能决策:进展、挑战与展望
Innovation (Camb). 2025 May 12;6(6):100948. doi: 10.1016/j.xinn.2025.100948. eCollection 2025 Jun 2.