Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA.
Vaccine Research Center, National Institute of Allergy and Infectious Diseases, National Institute of Health, Bethesda, MD, USA.
Bioinformatics. 2018 Aug 1;34(15):2605-2613. doi: 10.1093/bioinformatics/bty166.
Protein solubility plays a vital role in pharmaceutical research and production yield. For a given protein, the extent of its solubility can represent the quality of its function, and is ultimately defined by its sequence. Thus, it is imperative to develop novel, highly accurate in silico sequence-based protein solubility predictors. In this work we propose, DeepSol, a novel Deep Learning-based protein solubility predictor. The backbone of our framework is a convolutional neural network that exploits k-mer structure and additional sequence and structural features extracted from the protein sequence.
DeepSol outperformed all known sequence-based state-of-the-art solubility prediction methods and attained an accuracy of 0.77 and Matthew's correlation coefficient of 0.55. The superior prediction accuracy of DeepSol allows to screen for sequences with enhanced production capacity and can more reliably predict solubility of novel proteins.
DeepSol's best performing models and results are publicly deposited at https://doi.org/10.5281/zenodo.1162886 (Khurana and Mall, 2018).
Supplementary data are available at Bioinformatics online.
蛋白质溶解度在药物研究和生产产量中起着至关重要的作用。对于给定的蛋白质,其溶解度的程度可以代表其功能的质量,而最终由其序列决定。因此,开发新型、高度准确的基于序列的蛋白质溶解度预测器势在必行。在这项工作中,我们提出了 DeepSol,一种基于深度学习的蛋白质溶解度预测器。我们的框架的核心是一个卷积神经网络,它利用了 k-mer 结构以及从蛋白质序列中提取的其他序列和结构特征。
DeepSol 优于所有已知的基于序列的最新溶解度预测方法,达到了 0.77 的准确性和 0.55 的马修相关系数。DeepSol 优越的预测准确性允许筛选具有增强生产能力的序列,并更可靠地预测新型蛋白质的溶解度。
DeepSol 表现最佳的模型和结果在 https://doi.org/10.5281/zenodo.1162886(Khurana 和 Mall,2018)上公开存放。
补充数据可在生物信息学在线获得。