Suppr超能文献

基于功能和结构特征的大肠杆菌中蛋白质溶解度的机器学习方法预测。

Prediction of Solubility of Proteins in Escherichia coli Based on Functional and Structural Features Using Machine Learning Methods.

机构信息

School of Life Sciences, Shanghai University, Shanghai, 200444, People's Republic of China.

Department of Pharmacy, Shanghai Children's Medical Center, School of Medicine, Shanghai Jiao Tong University, Shanghai, 200127, China.

出版信息

Protein J. 2024 Oct;43(5):983-996. doi: 10.1007/s10930-024-10230-z. Epub 2024 Sep 7.

Abstract

Protein solubility is a critical parameter that determines the stability, activity, and functionality of proteins, with broad and far-reaching implications in biotechnology and biochemistry. Accurate prediction and control of protein solubility are essential for successful protein expression and purification in research and industrial settings. This study gathered information on soluble and insoluble proteins. In characterizing the proteins, they were mapped to STRING and characterized by functional and structural features. All functional/structural features were integrated to create a 5768-dimensional binary vector to encode proteins. Seven feature-ranking algorithms were employed to analyze the functional/structural features, yielding seven feature lists. These lists were subjected to the incremental feature selection, incorporating four classification algorithms, one by one to build effective classification models and identify functional/structural features with classification-related importance. Some essential functional/structural features used to differentiate between soluble and insoluble proteins were identified, including GO:0009987 (intercellular communication) and GO:0022613 (ribonucleoprotein complex biogenesis). The best classification model using support vector machine as the classification algorithm and 295 optimized functional/structural features generated the F1 score of 0.825, which can be a powerful tool to differentiate soluble proteins from insoluble proteins.

摘要

蛋白质溶解度是决定蛋白质稳定性、活性和功能的关键参数,在生物技术和生物化学领域具有广泛而深远的影响。准确预测和控制蛋白质溶解度对于研究和工业环境中成功表达和纯化蛋白质至关重要。本研究收集了可溶性和不溶性蛋白质的信息。在对蛋白质进行特征描述时,将其映射到 STRING 上,并根据功能和结构特征进行了特征描述。所有功能/结构特征都被整合到一个 5768 维的二进制向量中,以对蛋白质进行编码。使用了七种特征排序算法来分析功能/结构特征,得到了七个特征列表。这些列表经过增量特征选择,逐个结合四个分类算法,以构建有效的分类模型并确定与分类相关的重要功能/结构特征。确定了一些用于区分可溶性和不溶性蛋白质的基本功能/结构特征,包括 GO:0009987(细胞间通讯)和 GO:0022613(核糖核蛋白复合物生物发生)。使用支持向量机作为分类算法和 295 个优化的功能/结构特征的最佳分类模型生成的 F1 得分为 0.825,这可以成为区分可溶性蛋白质和不溶性蛋白质的有力工具。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验