Suppr超能文献

GATSol,一种通过 3D 结构图和大型语言模型协同作用增强蛋白质可溶性预测的方法。

GATSol, an enhanced predictor of protein solubility through the synergy of 3D structure graph and large language modeling.

机构信息

College of Biotechnology and Pharmaceutical Engineering, Nanjing Tech University, 30 South Puzhu Road, Jiangbei New District, Nanjing, 211816, Jiangsu, People's Republic of China.

出版信息

BMC Bioinformatics. 2024 Jun 1;25(1):204. doi: 10.1186/s12859-024-05820-8.

Abstract

BACKGROUND

Protein solubility is a critically important physicochemical property closely related to protein expression. For example, it is one of the main factors to be considered in the design and production of antibody drugs and a prerequisite for realizing various protein functions. Although several solubility prediction models have emerged in recent years, many of these models are limited to capturing information embedded in one-dimensional amino acid sequences, resulting in unsatisfactory predictive performance.

RESULTS

In this study, we introduce a novel Graph Attention network-based protein Solubility model, GATSol, which represents the 3D structure of proteins as a protein graph. In addition to the node features of amino acids extracted by the state-of-the-art protein large language model, GATSol utilizes amino acid distance maps generated using the latest AlphaFold technology. Rigorous testing on independent eSOL and the Saccharomyces cerevisiae test datasets has shown that GATSol outperforms most recently introduced models, especially with respect to the coefficient of determination R, which reaches 0.517 and 0.424, respectively. It outperforms the current state-of-the-art GraphSol by 18.4% on the S. cerevisiae_test set.

CONCLUSIONS

GATSol captures 3D dimensional features of proteins by building protein graphs, which significantly improves the accuracy of protein solubility prediction. Recent advances in protein structure modeling allow our method to incorporate spatial structure features extracted from predicted structures into the model by relying only on the input of protein sequences, which simplifies the entire graph neural network prediction process, making it more user-friendly and efficient. As a result, GATSol may help prioritize highly soluble proteins, ultimately reducing the cost and effort of experimental work. The source code and data of the GATSol model are freely available at https://github.com/binbinbinv/GATSol .

摘要

背景

蛋白质溶解度是一个至关重要的物理化学性质,与蛋白质表达密切相关。例如,它是抗体药物设计和生产中需要考虑的主要因素之一,也是实现各种蛋白质功能的前提条件。尽管近年来出现了几种溶解度预测模型,但这些模型大多局限于捕捉一维氨基酸序列中嵌入的信息,导致预测性能不理想。

结果

在这项研究中,我们引入了一种新的基于图注意力网络的蛋白质溶解度模型 GATSol,它将蛋白质的 3D 结构表示为蛋白质图。除了由最先进的蛋白质大语言模型提取的氨基酸节点特征外,GATSol 还利用了最新的 AlphaFold 技术生成的氨基酸距离图。在独立的 eSOL 和酿酒酵母测试数据集上的严格测试表明,GATSol 优于大多数最近引入的模型,特别是在决定系数 R 方面,分别达到 0.517 和 0.424,在酿酒酵母测试集上比当前最先进的 GraphSol 高出 18.4%。

结论

GATSol 通过构建蛋白质图来捕捉蛋白质的 3D 维度特征,从而显著提高了蛋白质溶解度预测的准确性。最近在蛋白质结构建模方面的进展使我们的方法能够仅依靠蛋白质序列的输入,将从预测结构中提取的空间结构特征纳入模型中,从而简化了整个图神经网络预测过程,使其更加用户友好和高效。因此,GATSol 可以帮助优先考虑高溶解度的蛋白质,最终降低实验工作的成本和工作量。GATSol 模型的源代码和数据可在 https://github.com/binbinbinv/GATSol 上免费获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/135f/11549816/0c5b7e904825/12859_2024_5820_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验