学习预测重组蛋白生产中载体的表达效力。

Learning to predict expression efficacy of vectors in recombinant protein production.

机构信息

Institute of Biomedical Informatics, National Yang-Ming University, Taipei, Taiwan.

出版信息

BMC Bioinformatics. 2010 Jan 18;11 Suppl 1(Suppl 1):S21. doi: 10.1186/1471-2105-11-S1-S21.

DOI:10.1186/1471-2105-11-S1-S21

PMID:20122193

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3009492/

Abstract

BACKGROUND

Recombinant protein production is a useful biotechnology to produce a large quantity of highly soluble proteins. Currently, the most widely used production system is to fuse a target protein into different vectors in Escherichia coli (E. coli). However, the production efficacy of different vectors varies for different target proteins. Trial-and-error is still the common practice to find out the efficacy of a vector for a given target protein. Previous studies are limited in that they assumed that proteins would be over-expressed and focused only on the solubility of expressed proteins. In fact, many pairings of vectors and proteins result in no expression.

RESULTS

In this study, we applied machine learning to train prediction models to predict whether a pairing of vector-protein will express or not express in E. coli. For expressed cases, the models further predict whether the expressed proteins would be soluble. We collected a set of real cases from the clients of our recombinant protein production core facility, where six different vectors were designed and studied. This set of cases is used in both training and evaluation of our models. We evaluate three different models based on the support vector machines (SVM) and their ensembles. Unlike many previous works, these models consider the sequence of the target protein as well as the sequence of the whole fusion vector as the features. We show that a model that classifies a case into one of the three classes (no expression, inclusion body and soluble) outperforms a model that considers the nested structure of the three classes, while a model that can take advantage of the hierarchical structure of the three classes performs slight worse but comparably to the best model. Meanwhile, compared to previous works, we show that the prediction accuracy of our best method still performs the best. Lastly, we briefly present two methods to use the trained model in the design of the recombinant protein production systems to improve the chance of high soluble protein production.

CONCLUSION

In this paper, we show that a machine learning approach to the prediction of the efficacy of a vector for a target protein in a recombinant protein production system is promising and may compliment traditional knowledge-driven study of the efficacy. We will release our program to share with other labs in the public domain when this paper is published.

摘要

背景

重组蛋白生产是一种生产大量高可溶性蛋白的有用生物技术。目前，最广泛使用的生产系统是将靶蛋白融合到大肠杆菌（E. coli）的不同载体中。然而，不同载体对不同靶蛋白的生产效率不同。寻找给定靶蛋白的载体的效果仍然是一种反复试验的做法。以前的研究仅限于假设蛋白质会过表达，并只关注表达蛋白的溶解度。事实上，许多载体-蛋白的配对结果是无表达。

结果

在这项研究中，我们应用机器学习来训练预测模型，以预测载体-蛋白的配对是否会在大肠杆菌中表达。对于表达的情况，模型进一步预测表达的蛋白是否可溶。我们从我们的重组蛋白生产核心设施的客户那里收集了一组真实案例，其中设计和研究了六种不同的载体。该组案例用于模型的训练和评估。我们基于支持向量机（SVM）及其集成来评估三个不同的模型。与许多以前的工作不同，这些模型将靶蛋白的序列以及整个融合载体的序列作为特征。我们表明，将案例分类为三类之一（无表达、包涵体和可溶）的模型优于考虑三类嵌套结构的模型，而能够利用三类层次结构的模型表现稍差，但与最佳模型相当。同时，与以前的工作相比，我们表明我们的最佳方法的预测准确性仍然表现最好。最后，我们简要介绍了两种在重组蛋白生产系统设计中使用训练模型的方法，以提高生产高可溶性蛋白的机会。

结论

在本文中，我们表明，机器学习方法预测重组蛋白生产系统中靶蛋白的载体效果是有前途的，并且可以补充传统的基于知识的载体效果研究。本文发表后，我们将在公共领域发布我们的程序，与其他实验室共享。

相似文献

Learning to predict expression efficacy of vectors in recombinant protein production.学习预测重组蛋白生产中载体的表达效力。

BMC Bioinformatics. 2010 Jan 18;11 Suppl 1(Suppl 1):S21. doi: 10.1186/1471-2105-11-S1-S21.

A review of machine learning methods to predict the solubility of overexpressed recombinant proteins in Escherichia coli.综述机器学习方法预测在大肠杆菌中过表达重组蛋白的溶解度。

BMC Bioinformatics. 2014 May 8;15:134. doi: 10.1186/1471-2105-15-134.

Bioinformatics approaches for improved recombinant protein production in Escherichia coli: protein solubility prediction.生物信息学方法可提高大肠杆菌中重组蛋白的生产效率：蛋白质可溶性预测。

Brief Bioinform. 2014 Nov;15(6):953-62. doi: 10.1093/bib/bbt057. Epub 2013 Aug 7.

Escherichia coli as a versatile cell factory: Advances and challenges in recombinant protein production.大肠杆菌作为一种多功能细胞工厂：在重组蛋白生产中的进展和挑战。

Protein Expr Purif. 2024 Jul;219:106463. doi: 10.1016/j.pep.2024.106463. Epub 2024 Mar 12.

A family of E. coli expression vectors for laboratory scale and high throughput soluble protein production.用于实验室规模和高通量可溶性蛋白质生产的大肠杆菌表达载体家族。

BMC Biotechnol. 2006 Mar 1;6:12. doi: 10.1186/1472-6750-6-12.

Optimization of culture parameters and novel strategies to improve protein solubility.优化培养参数及提高蛋白质溶解度的新策略。

Methods Mol Biol. 2015;1258:45-63. doi: 10.1007/978-1-4939-2205-5_3.

Chaperone-fusion expression plasmid vectors for improved solubility of recombinant proteins in Escherichia coli.用于提高重组蛋白在大肠杆菌中溶解度的伴侣蛋白融合表达质粒载体。

Gene. 2009 Jul 1;440(1-2):9-15. doi: 10.1016/j.gene.2009.03.011. Epub 2009 Mar 26.

[Influence of the reductase deficient Escherichia coli on the solubility of recombinant proteins produced in it].[还原酶缺陷型大肠杆菌对其所产生的重组蛋白溶解度的影响]

Sheng Wu Gong Cheng Xue Bao. 2003 Nov;19(6):686-91.

Strategies for the production of recombinant protein in Escherichia coli.大肠杆菌中重组蛋白的生产策略。

Protein J. 2013 Aug;32(6):419-25. doi: 10.1007/s10930-013-9502-5.

A support vector machine-based method for predicting the propensity of a protein to be soluble or to form inclusion body on overexpression in Escherichia coli.一种基于支持向量机的方法，用于预测蛋白质在大肠杆菌中过表达时可溶或形成包涵体的倾向。

Bioinformatics. 2006 Feb 1;22(3):278-84. doi: 10.1093/bioinformatics/bti810. Epub 2005 Dec 6.

引用本文的文献

A Novel Strategy to Identify Endolysins with Lytic Activity against Methicillin-Resistant .一种鉴定针对耐甲氧西林. 的溶菌素的新型策略

Int J Mol Sci. 2023 Mar 17;24(6):5772. doi: 10.3390/ijms24065772.

Enhancement of the solubility of recombinant proteins by fusion with a short-disordered peptide.通过与短无序肽融合来提高重组蛋白的溶解度。

J Microbiol. 2022 Sep;60(9):960-967. doi: 10.1007/s12275-022-2122-z. Epub 2022 Jul 14.

PERISCOPE-Opt: Machine learning-based prediction of optimal fermentation conditions and yields of recombinant periplasmic protein expressed in .潜望镜-Opt：基于机器学习预测在……中表达的重组周质蛋白的最佳发酵条件和产量。（你提供的原文似乎不完整，“expressed in”后面缺少具体内容）

Comput Struct Biotechnol J. 2022 Jun 3;20:2909-2920. doi: 10.1016/j.csbj.2022.06.006. eCollection 2022.

DSResSol: A Sequence-Based Solubility Predictor Created with Dilated Squeeze Excitation Residual Networks.DSResSol：一种基于序列的溶解度预测模型，该模型使用了扩张挤压残差网络创建。

Int J Mol Sci. 2021 Dec 17;22(24):13555. doi: 10.3390/ijms222413555.

Recombinant expression of insoluble enzymes in Escherichia coli: a systematic review of experimental design and its manufacturing implications.重组大肠杆菌中不溶性酶的表达：实验设计及其制造意义的系统评价。

Microb Cell Fact. 2021 Oct 30;20(1):208. doi: 10.1186/s12934-021-01698-w.

TISIGNER.com: web services for improving recombinant protein production.TISIGNER.com：用于改进重组蛋白生产的网络服务。

Nucleic Acids Res. 2021 Jul 2;49(W1):W654-W661. doi: 10.1093/nar/gkab175.

Structure-aware protein solubility prediction from sequence through graph convolutional network and predicted contact map.通过图卷积网络和预测的接触图从序列进行结构感知的蛋白质溶解度预测。

J Cheminform. 2021 Feb 8;13(1):7. doi: 10.1186/s13321-021-00488-1.

Improving protein solubility and activity by introducing small peptide tags designed with machine learning models.通过引入利用机器学习模型设计的小肽标签来提高蛋白质的溶解度和活性。

Metab Eng Commun. 2020 Jun 22;11:e00138. doi: 10.1016/j.mec.2020.e00138. eCollection 2020 Dec.

Solubility-Weighted Index: fast and accurate prediction of protein solubility.溶解度加权指数：快速准确预测蛋白质溶解度。

Bioinformatics. 2020 Sep 15;36(18):4691-4698. doi: 10.1093/bioinformatics/btaa578.

A comprehensive in silico characterization of bacterial signal peptides for the excretory production of phenylalanine ammonia lyase in .用于在……中排泄生产苯丙氨酸解氨酶的细菌信号肽的全面计算机模拟表征

3 Biotech. 2018 Dec;8(12):488. doi: 10.1007/s13205-018-1517-3. Epub 2018 Nov 16.

本文引用的文献

Parallel gene cloning and protein production in multiple expression systems.在多个表达系统中进行平行基因克隆和蛋白质生产。

Biotechnol Prog. 2009 Nov-Dec;25(6):1582-6. doi: 10.1002/btpr.274.

A comparison of methods for multiclass support vector machines.多类支持向量机方法的比较

IEEE Trans Neural Netw. 2002;13(2):415-25. doi: 10.1109/72.991427.

Recombinant protein solubility - does more mean better?重组蛋白的溶解度——越多就越好吗？

Nat Biotechnol. 2007 Jul;25(7):718-20. doi: 10.1038/nbt0707-718.

Protein solubility: sequence based prediction and experimental verification.蛋白质溶解度：基于序列的预测与实验验证。

Bioinformatics. 2007 Oct 1;23(19):2536-42. doi: 10.1093/bioinformatics/btl623. Epub 2006 Dec 6.

Genomic divergence of Escherichia coli strains: evidence for horizontal transfer and variation in mutation rates.大肠杆菌菌株的基因组差异：水平转移及突变率变化的证据

Int Microbiol. 2005 Dec;8(4):271-8.

Improving solubility of Shewanella oneidensis MR-1 and Clostridium thermocellum JW-20 proteins expressed into Esherichia coli.提高在大肠杆菌中表达的希瓦氏菌MR-1和嗜热栖热菌JW-20蛋白的溶解度。

J Proteome Res. 2005 Nov-Dec;4(6):1942-51. doi: 10.1021/pr050108j.

Bioinformatics. 2006 Feb 1;22(3):278-84. doi: 10.1093/bioinformatics/bti810. Epub 2005 Dec 6.

Self-cleavage of fusion protein in vivo using TEV protease to yield native protein.利用TEV蛋白酶在体内对融合蛋白进行自我切割以产生天然蛋白。

Protein Sci. 2005 Apr;14(4):936-41. doi: 10.1110/ps.041129605. Epub 2005 Mar 1.

AutoMotif server: prediction of single residue post-translational modifications in proteins.自动模式服务器：蛋白质中单个残基翻译后修饰的预测

Bioinformatics. 2005 May 15;21(10):2525-7. doi: 10.1093/bioinformatics/bti333. Epub 2005 Feb 22.

Understanding the relationship between the primary structure of proteins and its propensity to be soluble on overexpression in Escherichia coli.了解蛋白质一级结构与其在大肠杆菌中过表达时的可溶性倾向之间的关系。

Protein Sci. 2005 Mar;14(3):582-92. doi: 10.1110/ps.041009005. Epub 2005 Feb 2.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

学习预测重组蛋白生产中载体的表达效力。

Learning to predict expression efficacy of vectors in recombinant protein production.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献