• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于功能和结构特征的大肠杆菌中蛋白质溶解度的机器学习方法预测。

Prediction of Solubility of Proteins in Escherichia coli Based on Functional and Structural Features Using Machine Learning Methods.

机构信息

School of Life Sciences, Shanghai University, Shanghai, 200444, People's Republic of China.

Department of Pharmacy, Shanghai Children's Medical Center, School of Medicine, Shanghai Jiao Tong University, Shanghai, 200127, China.

出版信息

Protein J. 2024 Oct;43(5):983-996. doi: 10.1007/s10930-024-10230-z. Epub 2024 Sep 7.

DOI:10.1007/s10930-024-10230-z
PMID:39243320
Abstract

Protein solubility is a critical parameter that determines the stability, activity, and functionality of proteins, with broad and far-reaching implications in biotechnology and biochemistry. Accurate prediction and control of protein solubility are essential for successful protein expression and purification in research and industrial settings. This study gathered information on soluble and insoluble proteins. In characterizing the proteins, they were mapped to STRING and characterized by functional and structural features. All functional/structural features were integrated to create a 5768-dimensional binary vector to encode proteins. Seven feature-ranking algorithms were employed to analyze the functional/structural features, yielding seven feature lists. These lists were subjected to the incremental feature selection, incorporating four classification algorithms, one by one to build effective classification models and identify functional/structural features with classification-related importance. Some essential functional/structural features used to differentiate between soluble and insoluble proteins were identified, including GO:0009987 (intercellular communication) and GO:0022613 (ribonucleoprotein complex biogenesis). The best classification model using support vector machine as the classification algorithm and 295 optimized functional/structural features generated the F1 score of 0.825, which can be a powerful tool to differentiate soluble proteins from insoluble proteins.

摘要

蛋白质溶解度是决定蛋白质稳定性、活性和功能的关键参数,在生物技术和生物化学领域具有广泛而深远的影响。准确预测和控制蛋白质溶解度对于研究和工业环境中成功表达和纯化蛋白质至关重要。本研究收集了可溶性和不溶性蛋白质的信息。在对蛋白质进行特征描述时,将其映射到 STRING 上,并根据功能和结构特征进行了特征描述。所有功能/结构特征都被整合到一个 5768 维的二进制向量中,以对蛋白质进行编码。使用了七种特征排序算法来分析功能/结构特征,得到了七个特征列表。这些列表经过增量特征选择,逐个结合四个分类算法,以构建有效的分类模型并确定与分类相关的重要功能/结构特征。确定了一些用于区分可溶性和不溶性蛋白质的基本功能/结构特征,包括 GO:0009987(细胞间通讯)和 GO:0022613(核糖核蛋白复合物生物发生)。使用支持向量机作为分类算法和 295 个优化的功能/结构特征的最佳分类模型生成的 F1 得分为 0.825,这可以成为区分可溶性蛋白质和不溶性蛋白质的有力工具。

相似文献

1
Prediction of Solubility of Proteins in Escherichia coli Based on Functional and Structural Features Using Machine Learning Methods.基于功能和结构特征的大肠杆菌中蛋白质溶解度的机器学习方法预测。
Protein J. 2024 Oct;43(5):983-996. doi: 10.1007/s10930-024-10230-z. Epub 2024 Sep 7.
2
Periscope: quantitative prediction of soluble protein expression in the periplasm of Escherichia coli.潜望镜:大肠杆菌周质中可溶性蛋白质表达的定量预测
Sci Rep. 2016 Mar 2;6:21844. doi: 10.1038/srep21844.
3
A support vector machine-based method for predicting the propensity of a protein to be soluble or to form inclusion body on overexpression in Escherichia coli.一种基于支持向量机的方法,用于预测蛋白质在大肠杆菌中过表达时可溶或形成包涵体的倾向。
Bioinformatics. 2006 Feb 1;22(3):278-84. doi: 10.1093/bioinformatics/bti810. Epub 2005 Dec 6.
4
PLM_Sol: predicting protein solubility by benchmarking multiple protein language models with the updated Escherichia coli protein solubility dataset.PLM_Sol:通过使用更新的大肠杆菌蛋白质可溶性数据集对多个蛋白质语言模型进行基准测试来预测蛋白质可溶性。
Brief Bioinform. 2024 Jul 25;25(5). doi: 10.1093/bib/bbae404.
5
Prediction and analysis of protein solubility using a novel scoring card method with dipeptide composition.利用新型评分卡方法和二肽组成预测和分析蛋白质溶解度。
BMC Bioinformatics. 2012;13 Suppl 17(Suppl 17):S3. doi: 10.1186/1471-2105-13-S17-S3. Epub 2012 Dec 13.
6
Yield, solubility and conformational quality of soluble proteins are not simultaneously favored in recombinant Escherichia coli.重组大肠杆菌中可溶性蛋白质的产量、溶解度和构象质量并非同时受到青睐。
Biotechnol Bioeng. 2008 Dec 15;101(6):1353-8. doi: 10.1002/bit.21996.
7
Machine learning applied to enzyme turnover numbers reveals protein structural correlates and improves metabolic models.机器学习在酶周转率中的应用揭示了蛋白质结构相关性,并改进了代谢模型。
Nat Commun. 2018 Dec 7;9(1):5252. doi: 10.1038/s41467-018-07652-6.
8
Expression of functional Candida antarctica lipase B in a cell-free protein synthesis system derived from Escherichia coli.在源自大肠杆菌的无细胞蛋白质合成系统中功能性南极假丝酵母脂肪酶B的表达。
Biotechnol Prog. 2009 Mar-Apr;25(2):589-93. doi: 10.1002/btpr.109.
9
A review of machine learning methods to predict the solubility of overexpressed recombinant proteins in Escherichia coli.综述机器学习方法预测在大肠杆菌中过表达重组蛋白的溶解度。
BMC Bioinformatics. 2014 May 8;15:134. doi: 10.1186/1471-2105-15-134.
10
Machine-learning models for activity class prediction: A comparative study of feature selection and classification algorithms.机器学习模型在活动分类预测中的应用:特征选择与分类算法的对比研究。
Gait Posture. 2021 Sep;89:45-53. doi: 10.1016/j.gaitpost.2021.06.017. Epub 2021 Jun 24.

本文引用的文献

1
Improved multi-label classifiers for predicting protein subcellular localization.改进的多标签分类器用于预测蛋白质亚细胞定位。
Math Biosci Eng. 2024 Jan;21(1):214-236. doi: 10.3934/mbe.2024010. Epub 2022 Dec 11.
2
Identification of key genes associated with persistent immune changes and secondary immune activation responses induced by influenza vaccination after COVID-19 recovery by machine learning methods.利用机器学习方法鉴定与 COVID-19 康复后流感疫苗接种引起的持续免疫变化和二次免疫激活反应相关的关键基因。
Comput Biol Med. 2024 Feb;169:107883. doi: 10.1016/j.compbiomed.2023.107883. Epub 2023 Dec 22.
3
PDATC-NCPMKL: Predicting drug's Anatomical Therapeutic Chemical (ATC) codes based on network consistency projection and multiple kernel learning.
PDATC-NCPMKL:基于网络一致性投影和多核学习的药物解剖治疗化学(ATC)编码预测。
Comput Biol Med. 2024 Feb;169:107862. doi: 10.1016/j.compbiomed.2023.107862. Epub 2023 Dec 20.
4
PCDA-HNMP: Predicting circRNA-disease association using heterogeneous network and meta-path.PCDA-HNMP:基于异质网络和元路径预测 circRNA-疾病关联
Math Biosci Eng. 2023 Nov 14;20(12):20553-20575. doi: 10.3934/mbe.2023909.
5
Identification of Genes Associated with the Impairment of Olfactory and Gustatory Functions in COVID-19 via Machine-Learning Methods.通过机器学习方法鉴定与COVID-19嗅觉和味觉功能受损相关的基因
Life (Basel). 2023 Mar 15;13(3):798. doi: 10.3390/life13030798.
6
The STRING database in 2023: protein-protein association networks and functional enrichment analyses for any sequenced genome of interest.2023 年的 STRING 数据库:针对任何感兴趣的测序基因组的蛋白质-蛋白质关联网络和功能富集分析。
Nucleic Acids Res. 2023 Jan 6;51(D1):D638-D646. doi: 10.1093/nar/gkac1000.
7
SMART v1.0: A Database for Small Molecules with Functional Implications in Plants.SMART v1.0:一个具有植物功能意义的小分子数据库。
Interdiscip Sci. 2022 Mar;14(1):279-283. doi: 10.1007/s12539-021-00480-1. Epub 2021 Oct 14.
8
Distinguishing Glioblastoma Subtypes by Methylation Signatures.通过甲基化特征区分胶质母细胞瘤亚型
Front Genet. 2020 Nov 24;11:604336. doi: 10.3389/fgene.2020.604336. eCollection 2020.
9
Pfam: The protein families database in 2021.Pfam:2021 年的蛋白质家族数据库。
Nucleic Acids Res. 2021 Jan 8;49(D1):D412-D419. doi: 10.1093/nar/gkaa913.
10
Solubility-Weighted Index: fast and accurate prediction of protein solubility.溶解度加权指数:快速准确预测蛋白质溶解度。
Bioinformatics. 2020 Sep 15;36(18):4691-4698. doi: 10.1093/bioinformatics/btaa578.