Suppr超能文献

一种生成式语言模型解码了mRNA设计中密码子选择的上下文限制。

A generative language model decodes contextual constraints on codon choice for mRNA design.

作者信息

Faizi Marjan, Sakharova Helen, Lareau Liana F

出版信息

bioRxiv. 2025 Jun 6:2025.05.13.653614. doi: 10.1101/2025.05.13.653614.

Abstract

The genetic code allows multiple synonymous codons to encode the same amino acid, creating a vast sequence space for protein-coding regions. Codon choice can impact mRNA function and protein output, a consideration newly relevant with advances in mRNA technology. Genomes preferentially use some codons, but simple optimization methods that select preferred codons miss complex contextual patterns. We present Trias, an encoder-decoder language model trained on millions of eukaryotic coding sequences. Trias learns codon usage rules directly from sequence data, integrating local and global dependencies to generate species-specific codon sequences that align with biological constraints. Without explicit training on protein expression, Trias generates sequences and scores that correlate strongly with experimental measurements of mRNA stability, ribosome load, and protein output. The model outperforms commercial codon optimization tools in generating sequences resembling high-expression codon sequence variants. By modeling codon usage in context, Trias offers a data-driven framework for synthetic mRNA design and for understanding the molecular and evolutionary principles behind codon choice.

摘要

遗传密码允许多个同义密码子编码相同的氨基酸,从而为蛋白质编码区域创造了广阔的序列空间。密码子的选择会影响mRNA的功能和蛋白质产量,这一考虑因素随着mRNA技术的进步而变得愈发重要。基因组优先使用某些密码子,但选择偏好密码子的简单优化方法会忽略复杂的上下文模式。我们提出了Trias,这是一种基于数百万个真核生物编码序列训练的编码器-解码器语言模型。Trias直接从序列数据中学习密码子使用规则,整合局部和全局依赖性,以生成符合生物学限制的物种特异性密码子序列。在没有对蛋白质表达进行明确训练的情况下,Trias生成的序列和分数与mRNA稳定性、核糖体负载和蛋白质产量的实验测量结果高度相关。该模型在生成类似于高表达密码子序列变体的序列方面优于商业密码子优化工具。通过在上下文中对密码子使用进行建模,Trias为合成mRNA设计以及理解密码子选择背后的分子和进化原理提供了一个数据驱动的框架。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a6ac/12147546/27aad4697fdd/nihpp-2025.05.13.653614v2-f0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验