通过图卷积网络和预测的接触图从序列进行结构感知的蛋白质溶解度预测。

Structure-aware protein solubility prediction from sequence through graph convolutional network and predicted contact map.

作者信息

Chen Jianwen, Zheng Shuangjia, Zhao Huiying, Yang Yuedong

机构信息

School of Data and Computer Science, Sun Yat-Sen University, Guangzhou, China.

Sun Yat-Sen Memorial Hospital, Sun Yat-Sen University, Guangzhou, China.

出版信息

J Cheminform. 2021 Feb 8;13(1):7. doi: 10.1186/s13321-021-00488-1.

DOI:10.1186/s13321-021-00488-1

PMID:33557952

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7869490/

Abstract

Protein solubility is significant in producing new soluble proteins that can reduce the cost of biocatalysts or therapeutic agents. Therefore, a computational model is highly desired to accurately predict protein solubility from the amino acid sequence. Many methods have been developed, but they are mostly based on the one-dimensional embedding of amino acids that is limited to catch spatially structural information. In this study, we have developed a new structure-aware method GraphSol to predict protein solubility by attentive graph convolutional network (GCN), where the protein topology attribute graph was constructed through predicted contact maps only from the sequence. GraphSol was shown to substantially outperform other sequence-based methods. The model was proven to be stable by consistent [Formula: see text] of 0.48 in both the cross-validation and independent test of the eSOL dataset. To our best knowledge, this is the first study to utilize the GCN for sequence-based protein solubility predictions. More importantly, this architecture could be easily extended to other protein prediction tasks requiring a raw protein sequence.

摘要

蛋白质溶解度对于产生新的可溶性蛋白质具有重要意义，这些新的可溶性蛋白质可以降低生物催化剂或治疗剂的成本。因此，迫切需要一种计算模型来根据氨基酸序列准确预测蛋白质溶解度。已经开发了许多方法，但它们大多基于氨基酸的一维嵌入，这种方法在捕捉空间结构信息方面存在局限性。在本研究中，我们开发了一种新的结构感知方法GraphSol，通过注意力图卷积网络（GCN）预测蛋白质溶解度，其中蛋白质拓扑属性图仅通过从序列预测的接触图构建。结果表明，GraphSol显著优于其他基于序列的方法。在eSOL数据集的交叉验证和独立测试中，该模型的一致性[公式：见正文]为0.48，证明是稳定的。据我们所知，这是第一项利用GCN进行基于序列的蛋白质溶解度预测的研究。更重要的是，这种架构可以很容易地扩展到其他需要原始蛋白质序列的蛋白质预测任务。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f335/7869490/f323e08b0da8/13321_2021_488_Fig1_HTML.jpg

相似文献

Structure-aware protein solubility prediction from sequence through graph convolutional network and predicted contact map.通过图卷积网络和预测的接触图从序列进行结构感知的蛋白质溶解度预测。

J Cheminform. 2021 Feb 8;13(1):7. doi: 10.1186/s13321-021-00488-1.

PCP-GC-LM: single-sequence-based protein contact prediction using dual graph convolutional neural network and convolutional neural network.PCP-GC-LM：基于双图卷积神经网络和卷积神经网络的单序列蛋白质接触预测。

BMC Bioinformatics. 2024 Sep 2;25(1):287. doi: 10.1186/s12859-024-05914-3.

Protein Subcellular Localization Prediction Model Based on Graph Convolutional Network.基于图卷积网络的蛋白质亚细胞定位预测模型

Interdiscip Sci. 2022 Dec;14(4):937-946. doi: 10.1007/s12539-022-00529-9. Epub 2022 Jun 17.

GATSol, an enhanced predictor of protein solubility through the synergy of 3D structure graph and large language modeling.GATSol，一种通过 3D 结构图和大型语言模型协同作用增强蛋白质可溶性预测的方法。

BMC Bioinformatics. 2024 Jun 1;25(1):204. doi: 10.1186/s12859-024-05820-8.

Refined Contact Map Prediction of Peptides Based on GCN and ResNet.基于图卷积网络（GCN）和残差网络（ResNet）的肽段精细接触图预测

Front Genet. 2022 Apr 27;13:859626. doi: 10.3389/fgene.2022.859626. eCollection 2022.

Multi-channel GCN ensembled machine learning model for molecular aqueous solubility prediction on a clean dataset.基于清洁数据集的分子水溶性预测的多通道 GCN 集成机器学习模型。

Mol Divers. 2023 Jun;27(3):1023-1035. doi: 10.1007/s11030-022-10465-x. Epub 2022 Jun 23.

Learning spatial structures of proteins improves protein-protein interaction prediction.学习蛋白质的空间结构可以提高蛋白质-蛋白质相互作用的预测。

Brief Bioinform. 2022 Mar 10;23(2). doi: 10.1093/bib/bbab558.

Predicting the effects of mutations on protein solubility using graph convolution network and protein language model representation.利用图卷积网络和蛋白质语言模型表示预测突变对蛋白质溶解度的影响。

J Comput Chem. 2024 Mar 30;45(8):436-445. doi: 10.1002/jcc.27249. Epub 2023 Nov 7.

A novel hybrid framework for metabolic pathways prediction based on the graph attention network.基于图注意力网络的代谢途径预测新混合框架。

BMC Bioinformatics. 2022 Sep 28;23(Suppl 5):329. doi: 10.1186/s12859-022-04856-y.

MAMF-GCN: Multi-scale adaptive multi-channel fusion deep graph convolutional network for predicting mental disorder.MAMF-GCN：用于预测精神障碍的多尺度自适应多通道融合深度图卷积网络。

Comput Biol Med. 2022 Sep;148:105823. doi: 10.1016/j.compbiomed.2022.105823. Epub 2022 Jul 6.

引用本文的文献

Gated-GPS: enhancing protein-protein interaction site prediction with scalable learning and imbalance-aware optimization.门控全局预测系统（Gated-GPS）：通过可扩展学习和不平衡感知优化增强蛋白质-蛋白质相互作用位点预测

Brief Bioinform. 2025 May 1;26(3). doi: 10.1093/bib/bbaf248.

Decoding Solubility Signatures from Amyloid Monomer Energy Landscapes.从淀粉样蛋白单体能量景观中解码溶解度特征

J Chem Theory Comput. 2025 Mar 11;21(5):2736-2756. doi: 10.1021/acs.jctc.4c01623. Epub 2025 Feb 24.

ProCeSa: Contrast-Enhanced Structure-Aware Network for Thermostability Prediction with Protein Language Models.ProCeSa：用于蛋白质语言模型热稳定性预测的对比增强结构感知网络。

J Chem Inf Model. 2025 Mar 10;65(5):2304-2313. doi: 10.1021/acs.jcim.4c01752. Epub 2025 Feb 23.

ProG-SOL: Predicting Protein Solubility Using Protein Embeddings and Dual-Graph Convolutional Networks.ProG-SOL：利用蛋白质嵌入和双图卷积网络预测蛋白质溶解度

ACS Omega. 2025 Jan 24;10(4):3910-3916. doi: 10.1021/acsomega.4c09688. eCollection 2025 Feb 4.

GRACE: Generative Redesign in Artificial Computational Enzymology.GRACE：人工计算酶学中的生成式重新设计

ACS Synth Biol. 2024 Dec 20;13(12):4154-4164. doi: 10.1021/acssynbio.4c00624. Epub 2024 Nov 8.

Accurately predicting enzyme functions through geometric graph learning on ESMFold-predicted structures.通过在 ESMFold 预测结构上进行几何图形学习，准确预测酶功能。

Nat Commun. 2024 Sep 18;15(1):8180. doi: 10.1038/s41467-024-52533-w.

ProSol-multi: Protein solubility prediction via amino acids multi-level correlation and discriminative distribution.ProSol-multi：基于氨基酸多级相关性和判别性分布的蛋白质溶解度预测

Heliyon. 2024 Aug 22;10(17):e36041. doi: 10.1016/j.heliyon.2024.e36041. eCollection 2024 Sep 15.

PETA: evaluating the impact of protein transfer learning with sub-word tokenization on downstream applications.PETA：评估基于子词标记化的蛋白质迁移学习对下游应用的影响。

J Cheminform. 2024 Aug 2;16(1):92. doi: 10.1186/s13321-024-00884-3.

Ensemble Machine Learning and Predicted Properties Promote Antimicrobial Peptide Identification.集成机器学习和预测性质促进抗菌肽的鉴定。

Interdiscip Sci. 2024 Dec;16(4):951-965. doi: 10.1007/s12539-024-00640-z. Epub 2024 Jul 7.

HeteroTCR: A heterogeneous graph neural network-based method for predicting peptide-TCR interaction.HeteroTCR：一种基于异质图神经网络的预测肽-TCR 相互作用的方法。

Commun Biol. 2024 Jun 4;7(1):684. doi: 10.1038/s42003-024-06380-6.

本文引用的文献

Structure-based protein function prediction using graph convolutional networks.基于结构的蛋白质功能预测使用图卷积网络。

Nat Commun. 2021 May 26;12(1):3168. doi: 10.1038/s41467-021-23303-9.

Evaluating Protein Transfer Learning with TAPE.使用TAPE评估蛋白质迁移学习。

Adv Neural Inf Process Syst. 2019 Dec;32:9689-9701.

Modeling aspects of the language of life through transfer-learning protein sequences.通过转移学习蛋白质序列来模拟生命语言的各个方面。

BMC Bioinformatics. 2019 Dec 17;20(1):723. doi: 10.1186/s12859-019-3220-8.

To Improve Protein Sequence Profile Prediction through Image Captioning on Pairwise Residue Distance Map.通过对成对残差距离图的图像字幕提高蛋白质序列轮廓预测。

J Chem Inf Model. 2020 Jan 27;60(1):391-399. doi: 10.1021/acs.jcim.9b00438. Epub 2019 Dec 20.

SOLart: a structure-based method to predict protein solubility and aggregation.SOLart：一种基于结构的预测蛋白质可溶性和聚集性的方法。

Bioinformatics. 2020 Mar 1;36(5):1445-1452. doi: 10.1093/bioinformatics/btz773.

Develop machine learning-based regression predictive models for engineering protein solubility.开发基于机器学习的回归预测模型，用于工程蛋白质溶解度。

Bioinformatics. 2019 Nov 1;35(22):4640-4646. doi: 10.1093/bioinformatics/btz294.

Identifying Structure-Property Relationships through SMILES Syntax Analysis with Self-Attention Mechanism.通过带有自注意力机制的 SMILES 语法分析识别结构-性质关系。

J Chem Inf Model. 2019 Feb 25;59(2):914-923. doi: 10.1021/acs.jcim.8b00803. Epub 2019 Feb 6.

Accurate prediction of protein contact maps by coupling residual two-dimensional bidirectional long short-term memory with convolutional neural networks.通过将残差二维双向长短期记忆与卷积神经网络相结合，准确预测蛋白质接触图。

Bioinformatics. 2018 Dec 1;34(23):4039-4045. doi: 10.1093/bioinformatics/bty481.

deepNF: deep network fusion for protein function prediction.深度网络融合的蛋白质功能预测。

Bioinformatics. 2018 Nov 15;34(22):3873-3881. doi: 10.1093/bioinformatics/bty440.

DeepSol: a deep learning framework for sequence-based protein solubility prediction.DeepSol：一种基于序列的蛋白质可溶性预测的深度学习框架。

Bioinformatics. 2018 Aug 1;34(15):2605-2613. doi: 10.1093/bioinformatics/bty166.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

通过图卷积网络和预测的接触图从序列进行结构感知的蛋白质溶解度预测。

Structure-aware protein solubility prediction from sequence through graph convolutional network and predicted contact map.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献