Suppr超能文献

DeepSol:一种基于序列的蛋白质可溶性预测的深度学习框架。

DeepSol: a deep learning framework for sequence-based protein solubility prediction.

机构信息

Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA.

Vaccine Research Center, National Institute of Allergy and Infectious Diseases, National Institute of Health, Bethesda, MD, USA.

出版信息

Bioinformatics. 2018 Aug 1;34(15):2605-2613. doi: 10.1093/bioinformatics/bty166.

Abstract

MOTIVATION

Protein solubility plays a vital role in pharmaceutical research and production yield. For a given protein, the extent of its solubility can represent the quality of its function, and is ultimately defined by its sequence. Thus, it is imperative to develop novel, highly accurate in silico sequence-based protein solubility predictors. In this work we propose, DeepSol, a novel Deep Learning-based protein solubility predictor. The backbone of our framework is a convolutional neural network that exploits k-mer structure and additional sequence and structural features extracted from the protein sequence.

RESULTS

DeepSol outperformed all known sequence-based state-of-the-art solubility prediction methods and attained an accuracy of 0.77 and Matthew's correlation coefficient of 0.55. The superior prediction accuracy of DeepSol allows to screen for sequences with enhanced production capacity and can more reliably predict solubility of novel proteins.

AVAILABILITY AND IMPLEMENTATION

DeepSol's best performing models and results are publicly deposited at https://doi.org/10.5281/zenodo.1162886 (Khurana and Mall, 2018).

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

蛋白质溶解度在药物研究和生产产量中起着至关重要的作用。对于给定的蛋白质,其溶解度的程度可以代表其功能的质量,而最终由其序列决定。因此,开发新型、高度准确的基于序列的蛋白质溶解度预测器势在必行。在这项工作中,我们提出了 DeepSol,一种基于深度学习的蛋白质溶解度预测器。我们的框架的核心是一个卷积神经网络,它利用了 k-mer 结构以及从蛋白质序列中提取的其他序列和结构特征。

结果

DeepSol 优于所有已知的基于序列的最新溶解度预测方法,达到了 0.77 的准确性和 0.55 的马修相关系数。DeepSol 优越的预测准确性允许筛选具有增强生产能力的序列,并更可靠地预测新型蛋白质的溶解度。

可用性和实现

DeepSol 表现最佳的模型和结果在 https://doi.org/10.5281/zenodo.1162886(Khurana 和 Mall,2018)上公开存放。

补充信息

补充数据可在生物信息学在线获得。

相似文献

1
DeepSol: a deep learning framework for sequence-based protein solubility prediction.
Bioinformatics. 2018 Aug 1;34(15):2605-2613. doi: 10.1093/bioinformatics/bty166.
2
PaRSnIP: sequence-based protein solubility prediction using gradient boosting machine.
Bioinformatics. 2018 Apr 1;34(7):1092-1098. doi: 10.1093/bioinformatics/btx662.
3
DeepCrystal: a deep learning framework for sequence-based protein crystallization prediction.
Bioinformatics. 2019 Jul 1;35(13):2216-2225. doi: 10.1093/bioinformatics/bty953.
5
EPSOL: sequence-based protein solubility prediction using multidimensional embedding.
Bioinformatics. 2021 Dec 7;37(23):4314-4320. doi: 10.1093/bioinformatics/btab463.
6
Prediction of Protein Solubility Based on Sequence Feature Fusion and DDcCNN.
Interdiscip Sci. 2021 Dec;13(4):703-716. doi: 10.1007/s12539-021-00456-1. Epub 2021 Jul 8.
7
rawMSA: End-to-end Deep Learning using raw Multiple Sequence Alignments.
PLoS One. 2019 Aug 15;14(8):e0220182. doi: 10.1371/journal.pone.0220182. eCollection 2019.
9
Prediction of 8-state protein secondary structures by a novel deep learning architecture.
BMC Bioinformatics. 2018 Aug 3;19(1):293. doi: 10.1186/s12859-018-2280-5.
10
DeepECA: an end-to-end learning framework for protein contact prediction from a multiple sequence alignment.
BMC Bioinformatics. 2020 Jan 9;21(1):10. doi: 10.1186/s12859-019-3190-x.

引用本文的文献

2
ProG-SOL: Predicting Protein Solubility Using Protein Embeddings and Dual-Graph Convolutional Networks.
ACS Omega. 2025 Jan 24;10(4):3910-3916. doi: 10.1021/acsomega.4c09688. eCollection 2025 Feb 4.
3
Advances in cyclotide research: bioactivity to cyclotide-based therapeutics.
Mol Divers. 2025 Jan 25. doi: 10.1007/s11030-025-11113-w.
4
Benchmarking protein language models for protein crystallization.
Sci Rep. 2025 Jan 18;15(1):2381. doi: 10.1038/s41598-025-86519-5.
5
Protein engineering in the deep learning era.
mLife. 2024 Dec 26;3(4):477-491. doi: 10.1002/mlf2.12157. eCollection 2024 Dec.
7
PatchProt: hydrophobic patch prediction using protein foundation models.
Bioinform Adv. 2024 Oct 14;4(1):vbae154. doi: 10.1093/bioadv/vbae154. eCollection 2024.
8
Integrating machine learning to advance epitope mapping.
Front Immunol. 2024 Sep 30;15:1463931. doi: 10.3389/fimmu.2024.1463931. eCollection 2024.
9
ProSol-multi: Protein solubility prediction via amino acids multi-level correlation and discriminative distribution.
Heliyon. 2024 Aug 22;10(17):e36041. doi: 10.1016/j.heliyon.2024.e36041. eCollection 2024 Sep 15.

本文引用的文献

1
DeepSF: deep convolutional neural network for mapping protein sequences to folds.
Bioinformatics. 2018 Apr 15;34(8):1295-1303. doi: 10.1093/bioinformatics/btx780.
2
PaRSnIP: sequence-based protein solubility prediction using gradient boosting machine.
Bioinformatics. 2018 Apr 1;34(7):1092-1098. doi: 10.1093/bioinformatics/btx662.
3
Exploring the relationships between protein sequence, structure and solubility.
Curr Opin Struct Biol. 2017 Feb;42:136-146. doi: 10.1016/j.sbi.2017.01.004. Epub 2017 Feb 2.
4
Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model.
PLoS Comput Biol. 2017 Jan 5;13(1):e1005324. doi: 10.1371/journal.pcbi.1005324. eCollection 2017 Jan.
5
Protein Secondary Structure Prediction Using Deep Convolutional Neural Fields.
Sci Rep. 2016 Jan 11;6:18962. doi: 10.1038/srep18962.
6
Continuous Distributed Representation of Biological Sequences for Deep Proteomics and Genomics.
PLoS One. 2015 Nov 10;10(11):e0141287. doi: 10.1371/journal.pone.0141287. eCollection 2015.
8
Bioinformatics approaches for improved recombinant protein production in Escherichia coli: protein solubility prediction.
Brief Bioinform. 2014 Nov;15(6):953-62. doi: 10.1093/bib/bbt057. Epub 2013 Aug 7.
9
Prediction and analysis of protein solubility using a novel scoring card method with dipeptide composition.
BMC Bioinformatics. 2012;13 Suppl 17(Suppl 17):S3. doi: 10.1186/1471-2105-13-S17-S3. Epub 2012 Dec 13.
10
CD-HIT: accelerated for clustering the next-generation sequencing data.
Bioinformatics. 2012 Dec 1;28(23):3150-2. doi: 10.1093/bioinformatics/bts565. Epub 2012 Oct 11.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验