• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

ICOR:利用递归神经网络改进密码子优化。

ICOR: improving codon optimization with recurrent neural networks.

机构信息

Lattice Automation, 709 E 5th St. #3, Boston, MA, 02127, USA.

Harvard Medical School, 25 Shattuck St, Boston, MA, 02115, USA.

出版信息

BMC Bioinformatics. 2023 Apr 4;24(1):132. doi: 10.1186/s12859-023-05246-8.

DOI:10.1186/s12859-023-05246-8
PMID:37016283
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10074884/
Abstract

BACKGROUND

In protein sequences-as there are 61 sense codons but only 20 standard amino acids-most amino acids are encoded by more than one codon. Although such synonymous codons do not alter the encoded amino acid sequence, their selection can dramatically affect the expression of the resulting protein. Codon optimization of synthetic DNA sequences is important for heterologous expression. However, existing solutions are primarily based on choosing high-frequency codons only, neglecting the important effects of rare codons. In this paper, we propose a novel recurrent-neural-network based codon optimization tool, ICOR, that aims to learn codon usage bias on a genomic dataset of Escherichia coli. We compile a dataset of over 7,000 non-redundant, high-expression, robust genes which are used for deep learning. The model uses a bidirectional long short-term memory-based architecture, allowing for the sequential context of codon usage in genes to be learned. Our tool can predict synonymous codons for synthetic genes toward optimal expression in Escherichia coli.

RESULTS

We demonstrate that sequential context achieved via RNN may yield codon selection that is more similar to the host genome. Based on computational metrics that predict protein expression, ICOR theoretically optimizes protein expression more than frequency-based approaches. ICOR is evaluated on 1,481 Escherichia coli genes as well as a benchmark set of 40 select DNA sequences whose heterologous expression has been previously characterized. ICOR's performance is measured across five metrics: the Codon Adaptation Index, GC-content, negative repeat elements, negative cis-regulatory elements, and codon frequency distribution.

CONCLUSIONS

The results, based on in silico metrics, indicate that ICOR codon optimization is theoretically more effective in enhancing recombinant expression of proteins over other established codon optimization techniques. Our tool is provided as an open-source software package that includes the benchmark set of sequences used in this study.

摘要

背景

在蛋白质序列中,由于有 61 个有意义密码子,但只有 20 种标准氨基酸,因此大多数氨基酸都由不止一个密码子编码。虽然这些同义密码子不会改变编码的氨基酸序列,但它们的选择会极大地影响产生的蛋白质的表达。合成 DNA 序列的密码子优化对于异源表达非常重要。然而,现有的解决方案主要基于只选择高频密码子,而忽略了稀有密码子的重要影响。在本文中,我们提出了一种新的基于递归神经网络的密码子优化工具 ICOR,旨在学习大肠杆菌基因组数据集中的密码子使用偏好。我们编译了一个超过 7000 个非冗余、高表达、稳健基因的数据集,用于深度学习。该模型使用基于双向长短时记忆的架构,允许学习基因中密码子使用的顺序上下文。我们的工具可以预测合成基因的同义密码子,以实现大肠杆菌中的最佳表达。

结果

我们证明,通过 RNN 实现的序列上下文可以产生更类似于宿主基因组的密码子选择。基于预测蛋白质表达的计算指标,ICOR 在理论上比基于频率的方法更能优化蛋白质表达。ICOR 在 1481 个大肠杆菌基因以及之前已对其异源表达进行了特征描述的 40 个选择 DNA 序列的基准集中进行了评估。ICOR 的性能通过五个指标进行衡量:密码子适应指数、GC 含量、负重复元件、负顺式调控元件和密码子频率分布。

结论

基于计算机指标的结果表明,与其他已建立的密码子优化技术相比,ICOR 密码子优化在理论上更能有效地增强蛋白质的重组表达。我们的工具作为一个开源软件包提供,其中包括本研究中使用的基准序列集。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e256/10074884/9d858560806d/12859_2023_5246_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e256/10074884/fb138bb28df1/12859_2023_5246_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e256/10074884/62661c73f6da/12859_2023_5246_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e256/10074884/d2ec208e0098/12859_2023_5246_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e256/10074884/d2e763d76a07/12859_2023_5246_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e256/10074884/3695631fefce/12859_2023_5246_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e256/10074884/27e0d2da6096/12859_2023_5246_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e256/10074884/9d858560806d/12859_2023_5246_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e256/10074884/fb138bb28df1/12859_2023_5246_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e256/10074884/62661c73f6da/12859_2023_5246_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e256/10074884/d2ec208e0098/12859_2023_5246_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e256/10074884/d2e763d76a07/12859_2023_5246_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e256/10074884/3695631fefce/12859_2023_5246_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e256/10074884/27e0d2da6096/12859_2023_5246_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e256/10074884/9d858560806d/12859_2023_5246_Fig7_HTML.jpg

相似文献

1
ICOR: improving codon optimization with recurrent neural networks.ICOR:利用递归神经网络改进密码子优化。
BMC Bioinformatics. 2023 Apr 4;24(1):132. doi: 10.1186/s12859-023-05246-8.
2
Predicting synonymous codon usage and optimizing the heterologous gene for expression in E. coli.预测同义密码子的使用并优化大肠杆菌中异源基因的表达。
Sci Rep. 2017 Aug 30;7(1):9926. doi: 10.1038/s41598-017-10546-0.
3
The effective number of codons for individual amino acids: some codons are more optimal than others.个别氨基酸的有效密码子数量:某些密码子比其他密码子更具优化性。
Gene. 2003 Nov 27;320:185-90. doi: 10.1016/s0378-1119(03)00829-1.
4
Codon optimization with deep learning to enhance protein expression.利用深度学习进行密码子优化以增强蛋白质表达。
Sci Rep. 2020 Oct 19;10(1):17617. doi: 10.1038/s41598-020-74091-z.
5
The Selective Advantage of Synonymous Codon Usage Bias in Salmonella.沙门氏菌中同义密码子使用偏好的选择优势
PLoS Genet. 2016 Mar 10;12(3):e1005926. doi: 10.1371/journal.pgen.1005926. eCollection 2016 Mar.
6
Presyncodon, a Web Server for Gene Design with the Evolutionary Information of the Expression Hosts.前密码子,一个带有表达宿主进化信息的基因设计的网络服务器。
Int J Mol Sci. 2018 Dec 4;19(12):3872. doi: 10.3390/ijms19123872.
7
Codon pair optimization (CPO): a software tool for synthetic gene design based on codon pair bias to improve the expression of recombinant proteins in Pichia pastoris.密码子对优化 (CPO):一种基于密码子对偏好的用于合成基因设计的软件工具,用于提高毕赤酵母中重组蛋白的表达。
Microb Cell Fact. 2021 Nov 4;20(1):209. doi: 10.1186/s12934-021-01696-y.
8
Computational identification of rare codons of Escherichia coli based on codon pairs preference.基于密码子对偏好的大肠杆菌稀有密码子的计算鉴定。
BMC Bioinformatics. 2010 Jan 28;11:61. doi: 10.1186/1471-2105-11-61.
9
An improved implementation of effective number of codons (nc).有效密码子数(nc)的改进实现。
Mol Biol Evol. 2013 Jan;30(1):191-6. doi: 10.1093/molbev/mss201. Epub 2012 Aug 21.
10
%MinMax: A versatile tool for calculating and comparing synonymous codon usage and its impact on protein folding.%MinMax:一种用于计算和比较同义密码子使用情况及其对蛋白质折叠影响的通用工具。
Protein Sci. 2018 Jan;27(1):356-362. doi: 10.1002/pro.3336. Epub 2017 Nov 21.

引用本文的文献

1
Harnessing the Loop: The Perspective of Circular RNA in Modern Therapeutics.利用环状结构:现代治疗学中环状RNA的视角
Vaccines (Basel). 2025 Jul 31;13(8):821. doi: 10.3390/vaccines13080821.
2
Targeting Streptococcus pyogenes atpF protein for multi-epitope vaccine development: a genomics-driven immunoinformatics strategy.靶向化脓性链球菌atpF蛋白用于多表位疫苗开发:一种基于基因组学的免疫信息学策略
J Genet Eng Biotechnol. 2025 Sep;23(3):100546. doi: 10.1016/j.jgeb.2025.100546. Epub 2025 Aug 5.
3
A generative language model decodes contextual constraints on codon choice for mRNA design.

本文引用的文献

1
Optimizing the dynamics of protein expression.优化蛋白质表达的动力学。
Sci Rep. 2019 May 17;9(1):7511. doi: 10.1038/s41598-019-43857-5.
2
Recent Advances of Deep Learning in Bioinformatics and Computational Biology.深度学习在生物信息学和计算生物学中的最新进展
Front Genet. 2019 Mar 26;10:214. doi: 10.3389/fgene.2019.00214. eCollection 2019.
3
Presyncodon, a Web Server for Gene Design with the Evolutionary Information of the Expression Hosts.前密码子,一个带有表达宿主进化信息的基因设计的网络服务器。
一种生成式语言模型解码了mRNA设计中密码子选择的上下文限制。
bioRxiv. 2025 Jun 6:2025.05.13.653614. doi: 10.1101/2025.05.13.653614.
4
Comparative Analysis of Codon Optimization Tools: Advancing toward a Multi-Criteria Framework for Synthetic Gene Design.密码子优化工具的比较分析:迈向合成基因设计的多标准框架
J Microbiol Biotechnol. 2025 Apr 10;35:e2411066. doi: 10.4014/jmb.2411.11066.
5
A deep learning model trained on expressed transcripts across different tissue types reveals cell-type codon-optimization preferences.在不同组织类型的表达转录本上训练的深度学习模型揭示了细胞类型密码子优化偏好。
Nucleic Acids Res. 2025 Mar 20;53(6). doi: 10.1093/nar/gkaf233.
6
OPT: Codon optimize gene sequences for E. coli protein overexpression.OPT:对基因序列进行密码子优化,以实现大肠杆菌中蛋白质的过表达。
J Mol Biol. 2025 Aug 1;437(15):168965. doi: 10.1016/j.jmb.2025.168965. Epub 2025 Jan 28.
7
Unlocking the potential of circular RNA vaccines: a bioinformatics and computational biology perspective.解锁环状RNA疫苗的潜力:生物信息学与计算生物学视角
EBioMedicine. 2025 Apr;114:105638. doi: 10.1016/j.ebiom.2025.105638. Epub 2025 Mar 19.
8
Advances of computational methods enhance the development of multi-epitope vaccines.计算方法的进步推动了多表位疫苗的发展。
Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbaf055.
9
Blood DNA virome associates with autoimmune diseases and COVID-19.血液DNA病毒组与自身免疫性疾病和COVID-19相关。
Nat Genet. 2025 Jan;57(1):65-79. doi: 10.1038/s41588-024-02022-z. Epub 2025 Jan 3.
10
Predicting gene sequences with AI to study codon usage patterns.利用人工智能预测基因序列以研究密码子使用模式。
Proc Natl Acad Sci U S A. 2025 Jan 7;122(1):e2410003121. doi: 10.1073/pnas.2410003121. Epub 2024 Dec 31.
Int J Mol Sci. 2018 Dec 4;19(12):3872. doi: 10.3390/ijms19123872.
4
Enhancing the Translational Capacity of E. coli by Resolving the Codon Bias.通过解决密码子偏好性来增强大肠杆菌的转化能力。
ACS Synth Biol. 2018 Nov 16;7(11):2656-2664. doi: 10.1021/acssynbio.8b00332. Epub 2018 Nov 2.
5
Improving cytidine and adenine base editors by expression optimization and ancestral reconstruction.通过表达优化和祖先重建提高胞嘧啶和腺嘌呤碱基编辑器的性能。
Nat Biotechnol. 2018 Oct;36(9):843-846. doi: 10.1038/nbt.4172. Epub 2018 May 29.
6
Deep learning for healthcare: review, opportunities and challenges.深度学习在医疗保健领域的应用:综述、机遇与挑战。
Brief Bioinform. 2018 Nov 27;19(6):1236-1246. doi: 10.1093/bib/bbx044.
7
Widespread position-specific conservation of synonymous rare codons within coding sequences.编码序列中同义稀有密码子广泛存在的位置特异性保守性。
PLoS Comput Biol. 2017 May 5;13(5):e1005531. doi: 10.1371/journal.pcbi.1005531. eCollection 2017 May.
8
Synonymous Codons: Choose Wisely for Expression.同义密码子:为表达慎重选择。
Trends Genet. 2017 Apr;33(4):283-297. doi: 10.1016/j.tig.2017.02.001. Epub 2017 Mar 12.
9
Codon usage is an important determinant of gene expression levels largely through its effects on transcription.密码子使用情况在很大程度上通过其对转录的影响,成为基因表达水平的一个重要决定因素。
Proc Natl Acad Sci U S A. 2016 Oct 11;113(41):E6117-E6125. doi: 10.1073/pnas.1606724113. Epub 2016 Sep 26.
10
Recombinant pharmaceuticals from microbial cells: a 2015 update.来自微生物细胞的重组药物:2015年最新进展
Microb Cell Fact. 2016 Feb 9;15:33. doi: 10.1186/s12934-016-0437-3.