Suppr超能文献

密码子变换器:一种使用上下文感知神经网络的多物种密码子优化器。

CodonTransformer: a multispecies codon optimizer using context-aware neural networks.

作者信息

Fallahpour Adibvafa, Gureghian Vincent, Filion Guillaume J, Lindner Ariel B, Pandi Amir

机构信息

Vector Institute for Artificial Intelligence, Toronto, ON, Canada.

University of Toronto Scarborough; Department of Biological Science, Scarborough, ON, Canada.

出版信息

Nat Commun. 2025 Apr 3;16(1):3205. doi: 10.1038/s41467-025-58588-7.

Abstract

Degeneracy in the genetic code allows many possible DNA sequences to encode the same protein. Optimizing codon usage within a sequence to meet organism-specific preferences faces combinatorial explosion. Nevertheless, natural sequences optimized through evolution provide a rich source of data for machine learning algorithms to explore the underlying rules. Here, we introduce CodonTransformer, a multispecies deep learning model trained on over 1 million DNA-protein pairs from 164 organisms spanning all domains of life. The model demonstrates context-awareness thanks to its Transformers architecture and to our sequence representation strategy that combines organism, amino acid, and codon encodings. CodonTransformer generates host-specific DNA sequences with natural-like codon distribution profiles and with minimum negative cis-regulatory elements. This work introduces the strategy of Shared Token Representation and Encoding with Aligned Multi-masking (STREAM) and provides a codon optimization framework with a customizable open-access model and a user-friendly Google Colab interface.

摘要

遗传密码的简并性使得许多不同的DNA序列能够编码同一种蛋白质。在一个序列中优化密码子使用以符合特定生物体的偏好会面临组合爆炸问题。然而,通过进化优化的天然序列为机器学习算法探索潜在规则提供了丰富的数据来源。在此,我们介绍CodonTransformer,这是一个跨物种深度学习模型,它基于来自涵盖生命所有领域的164种生物体的超过100万个DNA-蛋白质对进行训练。由于其Transformer架构以及我们将生物体、氨基酸和密码子编码相结合的序列表示策略,该模型展现出上下文感知能力。CodonTransformer能生成具有类似天然密码子分布图谱且负向顺式调控元件最少的宿主特异性DNA序列。这项工作引入了共享标记表示与对齐多掩码编码策略(STREAM),并提供了一个具有可定制开放获取模型和用户友好型谷歌Colab界面的密码子优化框架。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/321a/11968976/955f488215f2/41467_2025_58588_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验