• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于序列物理化学模式和分布式表示信息的 DeepSoluE 预测蛋白质溶解度。

Prediction of protein solubility based on sequence physicochemical patterns and distributed representation information with DeepSoluE.

机构信息

School of Software Engineering, Chengdu University of Information Technology, Chengdu, China.

Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China.

出版信息

BMC Biol. 2023 Jan 24;21(1):12. doi: 10.1186/s12915-023-01510-8.

DOI:10.1186/s12915-023-01510-8
PMID:36694239
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9875434/
Abstract

BACKGROUND

Protein solubility is a precondition for efficient heterologous protein expression at the basis of most industrial applications and for functional interpretation in basic research. However, recurrent formation of inclusion bodies is still an inevitable roadblock in protein science and industry, where only nearly a quarter of proteins can be successfully expressed in soluble form. Despite numerous solubility prediction models having been developed over time, their performance remains unsatisfactory in the context of the current strong increase in available protein sequences. Hence, it is imperative to develop novel and highly accurate predictors that enable the prioritization of highly soluble proteins to reduce the cost of actual experimental work.

RESULTS

In this study, we developed a novel tool, DeepSoluE, which predicts protein solubility using a long-short-term memory (LSTM) network with hybrid features composed of physicochemical patterns and distributed representation of amino acids. Comparison results showed that the proposed model achieved more accurate and balanced performance than existing tools. Furthermore, we explored specific features that have a dominant impact on the model performance as well as their interaction effects.

CONCLUSIONS

DeepSoluE is suitable for the prediction of protein solubility in E. coli; it serves as a bioinformatics tool for prescreening of potentially soluble targets to reduce the cost of wet-experimental studies. The publicly available webserver is freely accessible at http://lab.malab.cn/~wangchao/softs/DeepSoluE/ .

摘要

背景

蛋白质可溶性是大多数工业应用中高效异源蛋白表达的前提,也是基础研究中功能解释的前提。然而,在蛋白质科学和工业中,包涵体的反复形成仍然是一个不可避免的障碍,只有近四分之一的蛋白质能够以可溶性形式成功表达。尽管随着时间的推移已经开发出了许多可溶性预测模型,但在当前可用蛋白质序列大量增加的情况下,它们的性能仍然不尽如人意。因此,开发新的、高度准确的预测器势在必行,这可以优先选择可溶性高的蛋白质,从而降低实际实验工作的成本。

结果

在这项研究中,我们开发了一种新的工具 DeepSoluE,它使用长短期记忆(LSTM)网络和由理化模式和氨基酸分布式表示组成的混合特征来预测蛋白质的可溶性。比较结果表明,所提出的模型比现有工具具有更准确和平衡的性能。此外,我们还探讨了对模型性能有显著影响的特定特征及其相互作用效应。

结论

DeepSoluE 适用于大肠杆菌中蛋白质可溶性的预测;它是一种生物信息学工具,可用于潜在可溶性靶标的预筛选,以降低湿实验研究的成本。该工具的公共可用网络服务器可免费访问,网址为 http://lab.malab.cn/~wangchao/softs/DeepSoluE/ 。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/07e4/9875434/7f5d5bc6bc30/12915_2023_1510_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/07e4/9875434/7432eee6775c/12915_2023_1510_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/07e4/9875434/b948aa76c2d9/12915_2023_1510_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/07e4/9875434/378fe80aa597/12915_2023_1510_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/07e4/9875434/7f5d5bc6bc30/12915_2023_1510_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/07e4/9875434/7432eee6775c/12915_2023_1510_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/07e4/9875434/b948aa76c2d9/12915_2023_1510_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/07e4/9875434/378fe80aa597/12915_2023_1510_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/07e4/9875434/7f5d5bc6bc30/12915_2023_1510_Fig4_HTML.jpg

相似文献

1
Prediction of protein solubility based on sequence physicochemical patterns and distributed representation information with DeepSoluE.基于序列物理化学模式和分布式表示信息的 DeepSoluE 预测蛋白质溶解度。
BMC Biol. 2023 Jan 24;21(1):12. doi: 10.1186/s12915-023-01510-8.
2
DeepAc4C: a convolutional neural network model with hybrid features composed of physicochemical patterns and distributed representation information for identification of N4-acetylcytidine in mRNA.DeepAc4C:一种卷积神经网络模型,具有由物化模式和分布式表示信息组成的混合特征,用于识别 mRNA 中的 N4-乙酰胞苷。
Bioinformatics. 2021 Dec 22;38(1):52-57. doi: 10.1093/bioinformatics/btab611.
3
Bioinformatics approaches for improved recombinant protein production in Escherichia coli: protein solubility prediction.生物信息学方法可提高大肠杆菌中重组蛋白的生产效率:蛋白质可溶性预测。
Brief Bioinform. 2014 Nov;15(6):953-62. doi: 10.1093/bib/bbt057. Epub 2013 Aug 7.
4
DSResSol: A Sequence-Based Solubility Predictor Created with Dilated Squeeze Excitation Residual Networks.DSResSol:一种基于序列的溶解度预测模型,该模型使用了扩张挤压残差网络创建。
Int J Mol Sci. 2021 Dec 17;22(24):13555. doi: 10.3390/ijms222413555.
5
Enhancer-FRL: Improved and Robust Identification of Enhancers and Their Activities Using Feature Representation Learning.增强子-FRL:利用特征表示学习改进并稳健识别增强子及其活性
IEEE/ACM Trans Comput Biol Bioinform. 2023 Mar-Apr;20(2):967-975. doi: 10.1109/TCBB.2022.3204365. Epub 2023 Apr 3.
6
ESPRESSO: a system for estimating protein expression and solubility in protein expression systems.ESPRESSO:一种用于估算蛋白质在蛋白质表达系统中的表达量和可溶性的系统。
Proteomics. 2013 May;13(9):1444-56. doi: 10.1002/pmic.201200175.
7
EPSOL: sequence-based protein solubility prediction using multidimensional embedding.EPSOL:基于序列的多维嵌合蛋白可溶性预测。
Bioinformatics. 2021 Dec 7;37(23):4314-4320. doi: 10.1093/bioinformatics/btab463.
8
A review of machine learning methods to predict the solubility of overexpressed recombinant proteins in Escherichia coli.综述机器学习方法预测在大肠杆菌中过表达重组蛋白的溶解度。
BMC Bioinformatics. 2014 May 8;15:134. doi: 10.1186/1471-2105-15-134.
9
SoluProt: prediction of soluble protein expression in Escherichia coli.SoluProt:大肠杆菌中可溶性蛋白质表达的预测
Bioinformatics. 2021 Apr 9;37(1):23-28. doi: 10.1093/bioinformatics/btaa1102.
10
GATSol, an enhanced predictor of protein solubility through the synergy of 3D structure graph and large language modeling.GATSol,一种通过 3D 结构图和大型语言模型协同作用增强蛋白质可溶性预测的方法。
BMC Bioinformatics. 2024 Jun 1;25(1):204. doi: 10.1186/s12859-024-05820-8.

引用本文的文献

1
Bag-of-words is competitive with sum-of-embeddings language-inspired representations on protein inference.词袋模型在蛋白质推理方面与基于语言启发的词嵌入求和表示法具有竞争力。
PLoS One. 2025 Aug 6;20(8):e0325531. doi: 10.1371/journal.pone.0325531. eCollection 2025.
2
Strategy for the Construction of SARS-CoV-2 S and N Recombinant Proteins and Their Immunogenicity Evaluation.严重急性呼吸综合征冠状病毒2刺突蛋白和核衣壳蛋白重组蛋白的构建策略及其免疫原性评估
BioTech (Basel). 2025 May 23;14(2):38. doi: 10.3390/biotech14020038.
3
Aminoacyl-tRNA synthetase urzymes optimized by deep learning behave as a quasispecies.

本文引用的文献

1
Single-sequence protein structure prediction using a language model and deep learning.基于语言模型和深度学习的单序列蛋白质结构预测。
Nat Biotechnol. 2022 Nov;40(11):1617-1623. doi: 10.1038/s41587-022-01432-w. Epub 2022 Oct 3.
2
DeepAc4C: a convolutional neural network model with hybrid features composed of physicochemical patterns and distributed representation information for identification of N4-acetylcytidine in mRNA.DeepAc4C:一种卷积神经网络模型,具有由物化模式和分布式表示信息组成的混合特征,用于识别 mRNA 中的 N4-乙酰胞苷。
Bioinformatics. 2021 Dec 22;38(1):52-57. doi: 10.1093/bioinformatics/btab611.
3
通过深度学习优化的氨酰-tRNA合成酶类酶表现为准种。
Struct Dyn. 2025 Apr 25;12(2):024701. doi: 10.1063/4.0000294. eCollection 2025 Mar.
4
One Health Approach to the Computational Design of a Lipoprotein-Based Multi-Epitope Vaccine Against Human and Livestock Tuberculosis.基于脂蛋白的抗人和家畜结核病多表位疫苗计算设计的一体化健康方法
Int J Mol Sci. 2025 Feb 13;26(4):1587. doi: 10.3390/ijms26041587.
5
ProG-SOL: Predicting Protein Solubility Using Protein Embeddings and Dual-Graph Convolutional Networks.ProG-SOL:利用蛋白质嵌入和双图卷积网络预测蛋白质溶解度
ACS Omega. 2025 Jan 24;10(4):3910-3916. doi: 10.1021/acsomega.4c09688. eCollection 2025 Feb 4.
6
Protein engineering in the deep learning era.深度学习时代的蛋白质工程。
mLife. 2024 Dec 26;3(4):477-491. doi: 10.1002/mlf2.12157. eCollection 2024 Dec.
7
In silico design and assessment of a multi-epitope peptide vaccine against multidrug-resistant .针对多重耐药性的多表位肽疫苗的计算机辅助设计与评估
In Silico Pharmacol. 2024 Dec 24;13(1):7. doi: 10.1007/s40203-024-00292-3. eCollection 2025.
8
analysis for the development of multi-epitope vaccines against .针对……开发多表位疫苗的分析
Front Immunol. 2024 Nov 18;15:1474346. doi: 10.3389/fimmu.2024.1474346. eCollection 2024.
9
MFPSP: Identification of fungal species-specific phosphorylation site using offspring competition-based genetic algorithm.MFPSP:基于子代竞争的遗传算法鉴定真菌物种特异性磷酸化位点
PLoS Comput Biol. 2024 Nov 18;20(11):e1012607. doi: 10.1371/journal.pcbi.1012607. eCollection 2024 Nov.
10
Computational engineering of water-soluble human potassium ion channels through QTY transformation.通过 QTY 转化进行水溶性人类钾离子通道的计算工程。
Sci Rep. 2024 Nov 15;14(1):28159. doi: 10.1038/s41598-024-76603-7.
Highly accurate protein structure prediction with AlphaFold.
利用 AlphaFold 进行高精度蛋白质结构预测。
Nature. 2021 Aug;596(7873):583-589. doi: 10.1038/s41586-021-03819-2. Epub 2021 Jul 15.
4
Deep-ABPpred: identifying antibacterial peptides in protein sequences using bidirectional LSTM with word2vec.Deep-ABPpred:使用带有 word2vec 的双向 LSTM 识别蛋白质序列中的抗菌肽。
Brief Bioinform. 2021 Sep 2;22(5). doi: 10.1093/bib/bbab065.
5
SoluProt: prediction of soluble protein expression in Escherichia coli.SoluProt:大肠杆菌中可溶性蛋白质表达的预测
Bioinformatics. 2021 Apr 9;37(1):23-28. doi: 10.1093/bioinformatics/btaa1102.
6
Identification of sub-Golgi protein localization by use of deep representation learning features.利用深度表征学习特征鉴定高尔基体亚结构蛋白定位
Bioinformatics. 2021 Apr 5;36(24):5600-5609. doi: 10.1093/bioinformatics/btaa1074.
7
Its2vec: Fungal Species Identification Using Sequence Embedding and Random Forest Classification.Its2vec:基于序列嵌入和随机森林分类的真菌物种鉴定。
Biomed Res Int. 2020 May 27;2020:2468789. doi: 10.1155/2020/2468789. eCollection 2020.
8
Solubility-Weighted Index: fast and accurate prediction of protein solubility.溶解度加权指数:快速准确预测蛋白质溶解度。
Bioinformatics. 2020 Sep 15;36(18):4691-4698. doi: 10.1093/bioinformatics/btaa578.
9
Insight into the protein solubility driving forces with neural attention.用神经注意力洞察蛋白质溶解度驱动力。
PLoS Comput Biol. 2020 Apr 30;16(4):e1007722. doi: 10.1371/journal.pcbi.1007722. eCollection 2020 Apr.
10
circDeep: deep learning approach for circular RNA classification from other long non-coding RNA.circDeep:一种从其他长链非编码RNA中进行环状RNA分类的深度学习方法。
Bioinformatics. 2020 Jan 1;36(1):73-80. doi: 10.1093/bioinformatics/btz537.