• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于蛋白质序列的有效基因表达预测与优化

Effective Gene Expression Prediction and Optimization from Protein Sequences.

作者信息

Liu Tuoyu, Zhang Yiyang, Li Yanjun, Xu Guoshun, Gao Han, Wang Pengtao, Tu Tao, Luo Huiying, Wu Ningfeng, Yao Bin, Liu Bo, Guan Feifei, Huang Huoqing, Tian Jian

机构信息

State Key Laboratory of Animal Nutrition and Feeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100193, China.

National Key Laboratory of Agricultural Microbiology, Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing, 100081, China.

出版信息

Adv Sci (Weinh). 2025 Feb;12(8):e2407664. doi: 10.1002/advs.202407664. Epub 2025 Jan 9.

DOI:10.1002/advs.202407664
PMID:39783932
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11848636/
Abstract

High soluble protein expression in heterologous hosts is crucial for various research and applications. Despite considerable research on the impact of codon usage on expression levels, the relationship between protein sequence and expression is often overlooked. In this study, a novel connection between protein expression and sequence is uncovered, leading to the development of SRAB (Strength of Relative Amino Acid Bias) based on AEI (Amino Acid Expression Index). The AEI served as an objective measure of this correlation, with higher AEI values enhancing soluble expression. Subsequently, the pre-trained protein model MP-TRANS (MindSpore Protein Transformer) is developed and fine-tuned using transfer learning techniques to create 88 prediction models (MPB-EXP) for predicting heterologous expression levels across 88 species. This approach achieved an average accuracy of 0.78, surpassing conventional machine learning methods. Additionally, a mutant generation model, MPB-MUT, is devised and utilized to enhance expression levels in specific hosts. Experimental validation demonstrated that the top 3 mutants of xylanase (previously not expressed in Escherichia coli) successfully achieved high-level soluble expression in E. coli. These findings highlight the efficacy of the developed model in predicting and optimizing gene expression based on protein sequences.

摘要

在异源宿主中实现高可溶性蛋白表达对于各种研究和应用至关重要。尽管对密码子使用对表达水平的影响进行了大量研究,但蛋白质序列与表达之间的关系常常被忽视。在本研究中,发现了蛋白质表达与序列之间的一种新联系,基于氨基酸表达指数(AEI)开发了相对氨基酸偏倚强度(SRAB)。AEI作为这种相关性的客观度量,AEI值越高,可溶性表达增强。随后,开发了预训练的蛋白质模型MP-TRANS(MindSpore蛋白质变换器),并使用迁移学习技术进行微调,以创建88个预测模型(MPB-EXP),用于预测88个物种的异源表达水平。这种方法实现了0.78的平均准确率,超过了传统机器学习方法。此外,设计并利用了一个突变体生成模型MPB-MUT来提高特定宿主中的表达水平。实验验证表明,木聚糖酶的前3个突变体(以前在大肠杆菌中不表达)在大肠杆菌中成功实现了高水平的可溶性表达。这些发现突出了所开发模型在基于蛋白质序列预测和优化基因表达方面的有效性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a3e0/11848636/79f38af4f620/ADVS-12-2407664-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a3e0/11848636/0862b4cd1cf0/ADVS-12-2407664-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a3e0/11848636/13ada0b4f6e6/ADVS-12-2407664-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a3e0/11848636/683283e1cbeb/ADVS-12-2407664-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a3e0/11848636/863bdc823ab0/ADVS-12-2407664-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a3e0/11848636/c32bde3e7ac4/ADVS-12-2407664-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a3e0/11848636/6ed2c81c60d0/ADVS-12-2407664-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a3e0/11848636/79f38af4f620/ADVS-12-2407664-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a3e0/11848636/0862b4cd1cf0/ADVS-12-2407664-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a3e0/11848636/13ada0b4f6e6/ADVS-12-2407664-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a3e0/11848636/683283e1cbeb/ADVS-12-2407664-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a3e0/11848636/863bdc823ab0/ADVS-12-2407664-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a3e0/11848636/c32bde3e7ac4/ADVS-12-2407664-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a3e0/11848636/6ed2c81c60d0/ADVS-12-2407664-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a3e0/11848636/79f38af4f620/ADVS-12-2407664-g007.jpg

相似文献

1
Effective Gene Expression Prediction and Optimization from Protein Sequences.基于蛋白质序列的有效基因表达预测与优化
Adv Sci (Weinh). 2025 Feb;12(8):e2407664. doi: 10.1002/advs.202407664. Epub 2025 Jan 9.
2
Protein-protein interaction and site prediction using transfer learning.基于迁移学习的蛋白质-蛋白质相互作用和位点预测。
Brief Bioinform. 2023 Sep 22;24(6). doi: 10.1093/bib/bbad376.
3
Predicting synonymous codon usage and optimizing the heterologous gene for expression in E. coli.预测同义密码子的使用并优化大肠杆菌中异源基因的表达。
Sci Rep. 2017 Aug 30;7(1):9926. doi: 10.1038/s41598-017-10546-0.
4
ICOR: improving codon optimization with recurrent neural networks.ICOR:利用递归神经网络改进密码子优化。
BMC Bioinformatics. 2023 Apr 4;24(1):132. doi: 10.1186/s12859-023-05246-8.
5
Design parameters to control synthetic gene expression in Escherichia coli.设计参数控制大肠杆菌中合成基因的表达。
PLoS One. 2009 Sep 14;4(9):e7002. doi: 10.1371/journal.pone.0007002.
6
Predicting gene sequences with AI to study codon usage patterns.利用人工智能预测基因序列以研究密码子使用模式。
Proc Natl Acad Sci U S A. 2025 Jan 7;122(1):e2410003121. doi: 10.1073/pnas.2410003121. Epub 2024 Dec 31.
7
Predicting gene expression level from relative codon usage bias: an application to Escherichia coli genome.从相对密码子使用偏好预测基因表达水平:在大肠杆菌基因组中的应用
DNA Res. 2009 Feb;16(1):13-30. doi: 10.1093/dnares/dsn029. Epub 2009 Jan 8.
8
Codon optimization with deep learning to enhance protein expression.利用深度学习进行密码子优化以增强蛋白质表达。
Sci Rep. 2020 Oct 19;10(1):17617. doi: 10.1038/s41598-020-74091-z.
9
Predicting gene expression level from codon usage bias.从密码子使用偏好预测基因表达水平。
Mol Biol Evol. 2007 Jan;24(1):10-2. doi: 10.1093/molbev/msl148. Epub 2006 Oct 12.
10
Exploring codon context bias for synthetic gene design of a thermostable invertase in Escherichia coli.探索用于大肠杆菌中热稳定转化酶合成基因设计的密码子上下文偏好性。
Enzyme Microb Technol. 2015 Jul-Aug;75-76:57-63. doi: 10.1016/j.enzmictec.2015.04.008. Epub 2015 May 1.

本文引用的文献

1
Amino acid sequence encodes protein abundance shaped by protein stability at reduced synthesis cost.氨基酸序列编码蛋白质丰度,其由降低合成成本时的蛋白质稳定性所塑造。
Protein Sci. 2025 Jan;34(1):e5239. doi: 10.1002/pro.5239.
2
Synthetic intrinsically disordered protein fusion tags that enhance protein solubility.融合标签提高蛋白可溶性的人工合成无规卷曲蛋白。
Nat Commun. 2024 May 2;15(1):3727. doi: 10.1038/s41467-024-47519-7.
3
Rational design of soluble expressed human aldehyde dehydrogenase 2 with high stability and activity in pepsin and trypsin.
在胃蛋白酶和胰蛋白酶中具有高稳定性和活性的可溶性表达人醛脱氢酶 2 的合理设计。
Int J Biol Macromol. 2024 Apr;265(Pt 2):131091. doi: 10.1016/j.ijbiomac.2024.131091. Epub 2024 Mar 21.
4
Protein-protein interaction and site prediction using transfer learning.基于迁移学习的蛋白质-蛋白质相互作用和位点预测。
Brief Bioinform. 2023 Sep 22;24(6). doi: 10.1093/bib/bbad376.
5
MECE: a method for enhancing the catalytic efficiency of glycoside hydrolase based on deep neural networks and molecular evolution.MECE:一种基于深度神经网络和分子进化提高糖苷水解酶催化效率的方法。
Sci Bull (Beijing). 2023 Nov 30;68(22):2793-2805. doi: 10.1016/j.scib.2023.09.039. Epub 2023 Sep 29.
6
PaxDb 5.0: Curated Protein Quantification Data Suggests Adaptive Proteome Changes in Yeasts.PaxDb 5.0:经过整理的蛋白质定量数据表明酵母中蛋白质组的适应性变化。
Mol Cell Proteomics. 2023 Oct;22(10):100640. doi: 10.1016/j.mcpro.2023.100640. Epub 2023 Aug 31.
7
De novo design of protein structure and function with RFdiffusion.利用 RFdiffusion 从头设计蛋白质结构和功能。
Nature. 2023 Aug;620(7976):1089-1100. doi: 10.1038/s41586-023-06415-8. Epub 2023 Jul 11.
8
Rational redesign of thermophilic PET hydrolase LCCICCG to enhance hydrolysis of high crystallinity polyethylene terephthalates.理性设计嗜热 PET 水解酶 LCCICCG 以增强对高结晶度聚对苯二甲酸乙二醇酯的水解。
J Hazard Mater. 2023 Jul 5;453:131386. doi: 10.1016/j.jhazmat.2023.131386. Epub 2023 Apr 7.
9
UniDL4BioPep: a universal deep learning architecture for binary classification in peptide bioactivity.UniDL4BioPep:用于肽生物活性二元分类的通用深度学习架构。
Brief Bioinform. 2023 May 19;24(3). doi: 10.1093/bib/bbad135.
10
Large language models generate functional protein sequences across diverse families.大型语言模型可生成不同家族的功能性蛋白质序列。
Nat Biotechnol. 2023 Aug;41(8):1099-1106. doi: 10.1038/s41587-022-01618-2. Epub 2023 Jan 26.