• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基因组语言模型:机遇与挑战。

Genomic language models: opportunities and challenges.

作者信息

Benegas Gonzalo, Ye Chengzhong, Albors Carlos, Li Jianan Canal, Song Yun S

机构信息

Computer Science Division, University of California, Berkeley, CA, USA.

Department of Statistics, University of California, Berkeley, CA, USA.

出版信息

Trends Genet. 2025 Apr;41(4):286-302. doi: 10.1016/j.tig.2024.11.013. Epub 2025 Jan 2.

DOI:10.1016/j.tig.2024.11.013
PMID:39753409
Abstract

Large language models (LLMs) are having transformative impacts across a wide range of scientific fields, particularly in the biomedical sciences. Just as the goal of natural language processing is to understand sequences of words, a major objective in biology is to understand biological sequences. Genomic language models (gLMs), which are LLMs trained on DNA sequences, have the potential to significantly advance our understanding of genomes and how DNA elements at various scales interact to give rise to complex functions. To showcase this potential, we highlight key applications of gLMs, including functional constraint prediction, sequence design, and transfer learning. Despite notable recent progress, however, developing effective and efficient gLMs presents numerous challenges, especially for species with large, complex genomes. Here, we discuss major considerations for developing and evaluating gLMs.

摘要

大语言模型(LLMs)正在对广泛的科学领域产生变革性影响,尤其是在生物医学领域。正如自然语言处理的目标是理解单词序列一样,生物学的一个主要目标是理解生物序列。基因组语言模型(gLMs)是在DNA序列上训练的大语言模型,有潜力显著推进我们对基因组以及不同尺度的DNA元件如何相互作用以产生复杂功能的理解。为了展示这种潜力,我们重点介绍了基因组语言模型的关键应用,包括功能约束预测、序列设计和迁移学习。然而,尽管最近取得了显著进展,但开发有效且高效的基因组语言模型仍面临诸多挑战,特别是对于具有庞大而复杂基因组的物种。在这里,我们讨论了开发和评估基因组语言模型的主要注意事项。

相似文献

1
Genomic language models: opportunities and challenges.基因组语言模型:机遇与挑战。
Trends Genet. 2025 Apr;41(4):286-302. doi: 10.1016/j.tig.2024.11.013. Epub 2025 Jan 2.
2
Genomic Language Models: Opportunities and Challenges.基因组语言模型:机遇与挑战。
ArXiv. 2024 Sep 22:arXiv:2407.11435v2.
3
Deciphering genomic codes using advanced natural language processing techniques: a scoping review.使用先进自然语言处理技术解读基因组编码:一项范围综述
J Am Med Inform Assoc. 2025 Apr 1;32(4):761-772. doi: 10.1093/jamia/ocaf029.
4
Large language models for biomedicine: foundations, opportunities, challenges, and best practices.大型语言模型在生物医学领域的应用:基础、机遇、挑战和最佳实践。
J Am Med Inform Assoc. 2024 Sep 1;31(9):2114-2124. doi: 10.1093/jamia/ocae074.
5
Large language models for science and medicine.用于科学和医学的大型语言模型。
Eur J Clin Invest. 2024 Jun;54(6):e14183. doi: 10.1111/eci.14183. Epub 2024 Feb 21.
6
Evaluating the representational power of pre-trained DNA language models for regulatory genomics.评估预训练DNA语言模型在调控基因组学方面的表征能力。
bioRxiv. 2024 Sep 25:2024.02.29.582810. doi: 10.1101/2024.02.29.582810.
7
Are genomic language models all you need? Exploring genomic language models on protein downstream tasks.是否仅需基因组语言模型?探索基因组语言模型在蛋白质下游任务中的应用。
Bioinformatics. 2024 Sep 2;40(9). doi: 10.1093/bioinformatics/btae529.
8
DNA language models are powerful predictors of genome-wide variant effects.DNA 语言模型是全基因组变异效应的有力预测因子。
Proc Natl Acad Sci U S A. 2023 Oct 31;120(44):e2311219120. doi: 10.1073/pnas.2311219120. Epub 2023 Oct 26.
9
Prompt Engineering Paradigms for Medical Applications: Scoping Review.医学应用的提示工程范式:范围综述。
J Med Internet Res. 2024 Sep 10;26:e60501. doi: 10.2196/60501.
10
Distinguishing word identity and sequence context in DNA language models.在 DNA 语言模型中区分单词身份和序列上下文。
BMC Bioinformatics. 2024 Sep 13;25(1):301. doi: 10.1186/s12859-024-05869-5.

引用本文的文献

1
ARCADE: Controllable Codon Design from Foundation Models via Activation Engineering.ARCADE:通过激活工程从基础模型进行可控密码子设计
bioRxiv. 2025 Aug 23:2025.08.19.668819. doi: 10.1101/2025.08.19.668819.
2
Pre-training Genomic Language Model with Variants for Better Modeling Functional Genomics.使用变异体预训练基因组语言模型以更好地建模功能基因组学。
bioRxiv. 2025 Aug 23:2025.02.26.640468. doi: 10.1101/2025.02.26.640468.
3
Creating interpretable deep learning models to identify species using environmental DNA sequences.创建可解释的深度学习模型以利用环境DNA序列识别物种。
Sci Rep. 2025 Jul 28;15(1):27436. doi: 10.1038/s41598-025-09846-7.
4
Towards improved fine-mapping of candidate causal variants.迈向对候选因果变异更精细的定位。
Nat Rev Genet. 2025 Jul 28. doi: 10.1038/s41576-025-00869-4.
5
In silico prediction of variant effects: promises and limitations for precision plant breeding.变异效应的计算机模拟预测:精准植物育种的前景与局限
Theor Appl Genet. 2025 Jul 28;138(8):193. doi: 10.1007/s00122-025-04973-1.
6
Context-dependent regulatory variants in Alzheimer's disease.阿尔茨海默病中依赖于背景的调控变异体
bioRxiv. 2025 Jul 24:2025.07.11.659973. doi: 10.1101/2025.07.11.659973.
7
MutBERT: probabilistic genome representation improves genomics foundation models.MutBERT:概率基因组表示法改进了基因组学基础模型。
Bioinformatics. 2025 Jul 1;41(Supplement_1):i294-i303. doi: 10.1093/bioinformatics/btaf229.
8
spRefine Denoises and Imputes Spatial Transcriptomics with a Reference-Free Framework Powered by Genomic Language Model.spRefine:使用由基因组语言模型驱动的无参考框架对空间转录组学进行去噪和插补。
bioRxiv. 2025 Jul 7:2025.04.22.649977. doi: 10.1101/2025.04.22.649977.
9
Multi-task genomic prediction using gated residual variable selection neural networks.使用门控残差变量选择神经网络的多任务基因组预测
BMC Bioinformatics. 2025 Jul 7;26(1):167. doi: 10.1186/s12859-025-06188-z.
10
Genomic language models (gLMs) decode bacterial genomes for improved gene prediction and translation initiation site identification.基因组语言模型(gLMs)对细菌基因组进行解码,以改进基因预测和翻译起始位点识别。
Brief Bioinform. 2025 Jul 2;26(4). doi: 10.1093/bib/bbaf311.