• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用新型评分卡方法和二肽组成预测和分析蛋白质溶解度。

Prediction and analysis of protein solubility using a novel scoring card method with dipeptide composition.

机构信息

Department of Biological Science and Technology, National Chiao Tung University, Hsinchu, Taiwan.

出版信息

BMC Bioinformatics. 2012;13 Suppl 17(Suppl 17):S3. doi: 10.1186/1471-2105-13-S17-S3. Epub 2012 Dec 13.

DOI:10.1186/1471-2105-13-S17-S3
PMID:23282103
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3521471/
Abstract

BACKGROUND

Existing methods for predicting protein solubility on overexpression in Escherichia coli advance performance by using ensemble classifiers such as two-stage support vector machine (SVM) based classifiers and a number of feature types such as physicochemical properties, amino acid and dipeptide composition, accompanied with feature selection. It is desirable to develop a simple and easily interpretable method for predicting protein solubility, compared to existing complex SVM-based methods.

RESULTS

This study proposes a novel scoring card method (SCM) by using dipeptide composition only to estimate solubility scores of sequences for predicting protein solubility. SCM calculates the propensities of 400 individual dipeptides to be soluble using statistic discrimination between soluble and insoluble proteins of a training data set. Consequently, the propensity scores of all dipeptides are further optimized using an intelligent genetic algorithm. The solubility score of a sequence is determined by the weighted sum of all propensity scores and dipeptide composition. To evaluate SCM by performance comparisons, four data sets with different sizes and variation degrees of experimental conditions were used. The results show that the simple method SCM with interpretable propensities of dipeptides has promising performance, compared with existing SVM-based ensemble methods with a number of feature types. Furthermore, the propensities of dipeptides and solubility scores of sequences can provide insights to protein solubility. For example, the analysis of dipeptide scores shows high propensity of α-helix structure and thermophilic proteins to be soluble.

CONCLUSIONS

The propensities of individual dipeptides to be soluble are varied for proteins under altered experimental conditions. For accurately predicting protein solubility using SCM, it is better to customize the score card of dipeptide propensities by using a training data set under the same specified experimental conditions. The proposed method SCM with solubility scores and dipeptide propensities can be easily applied to the protein function prediction problems that dipeptide composition features play an important role.

AVAILABILITY

The used datasets, source codes of SCM, and supplementary files are available at http://iclab.life.nctu.edu.tw/SCM/.

摘要

背景

现有的预测蛋白质在大肠杆菌中过表达时溶解度的方法通过使用集成分类器(如基于两阶段支持向量机 (SVM) 的分类器)和许多特征类型(如理化性质、氨基酸和二肽组成)来提高性能,并结合特征选择。与现有的基于 SVM 的复杂方法相比,开发一种简单且易于解释的预测蛋白质溶解度的方法是很有必要的。

结果

本研究提出了一种新的评分卡方法(SCM),仅使用二肽组成来估计序列的可溶性评分,以预测蛋白质的溶解度。SCM 通过在训练数据集可溶性和不溶性蛋白质之间进行统计判别来计算 400 个单个二肽的可溶性倾向。然后,使用智能遗传算法进一步优化所有二肽的倾向得分。序列的溶解度得分由所有倾向得分和二肽组成的加权和确定。为了通过性能比较来评估 SCM,使用了四个具有不同大小和实验条件变化程度的数据集。结果表明,与具有多种特征类型的现有的基于 SVM 的集成方法相比,具有可解释二肽倾向的简单方法 SCM 具有良好的性能。此外,二肽的倾向和序列的溶解度得分可以提供对蛋白质溶解度的深入了解。例如,二肽得分的分析表明,α-螺旋结构和嗜热蛋白具有较高的可溶性倾向。

结论

在改变的实验条件下,蛋白质中二肽的可溶性倾向是不同的。为了使用 SCM 准确预测蛋白质的溶解度,最好使用相同指定实验条件下的训练数据集来定制二肽倾向的评分卡。提出的具有溶解度得分和二肽倾向的方法 SCM 可以很容易地应用于二肽组成特征起重要作用的蛋白质功能预测问题。

可用性

使用的数据集、SCM 的源代码和补充文件可在 http://iclab.life.nctu.edu.tw/SCM/ 上获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f2bd/3521471/74168c03170b/1471-2105-13-S17-S3-7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f2bd/3521471/a1bfa218883b/1471-2105-13-S17-S3-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f2bd/3521471/708c5db47ff5/1471-2105-13-S17-S3-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f2bd/3521471/08b2b8015e13/1471-2105-13-S17-S3-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f2bd/3521471/de4166242185/1471-2105-13-S17-S3-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f2bd/3521471/a2059bad2324/1471-2105-13-S17-S3-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f2bd/3521471/b6983e00f68d/1471-2105-13-S17-S3-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f2bd/3521471/74168c03170b/1471-2105-13-S17-S3-7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f2bd/3521471/a1bfa218883b/1471-2105-13-S17-S3-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f2bd/3521471/708c5db47ff5/1471-2105-13-S17-S3-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f2bd/3521471/08b2b8015e13/1471-2105-13-S17-S3-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f2bd/3521471/de4166242185/1471-2105-13-S17-S3-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f2bd/3521471/a2059bad2324/1471-2105-13-S17-S3-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f2bd/3521471/b6983e00f68d/1471-2105-13-S17-S3-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f2bd/3521471/74168c03170b/1471-2105-13-S17-S3-7.jpg

相似文献

1
Prediction and analysis of protein solubility using a novel scoring card method with dipeptide composition.利用新型评分卡方法和二肽组成预测和分析蛋白质溶解度。
BMC Bioinformatics. 2012;13 Suppl 17(Suppl 17):S3. doi: 10.1186/1471-2105-13-S17-S3. Epub 2012 Dec 13.
2
SCMCRYS: predicting protein crystallization using an ensemble scoring card method with estimating propensity scores of P-collocated amino acid pairs.SCMCRYS:使用基于 P 位氨基酸对倾向得分估计的集成评分卡方法预测蛋白质结晶。
PLoS One. 2013 Sep 3;8(9):e72368. doi: 10.1371/journal.pone.0072368. eCollection 2013.
3
SCMMTP: identifying and characterizing membrane transport proteins using propensity scores of dipeptides.SCMMTP:利用二肽倾向得分鉴定和表征膜转运蛋白
BMC Genomics. 2015;16 Suppl 12(Suppl 12):S6. doi: 10.1186/1471-2164-16-S12-S6. Epub 2015 Dec 9.
4
PVPred-SCM: Improved Prediction and Analysis of Phage Virion Proteins Using a Scoring Card Method.PVPred-SCM:利用评分卡方法改进噬菌体衣壳蛋白的预测和分析。
Cells. 2020 Feb 3;9(2):353. doi: 10.3390/cells9020353.
5
SCMPSP: Prediction and characterization of photosynthetic proteins based on a scoring card method.SCMPSP:基于计分卡方法的光合蛋白预测与表征
BMC Bioinformatics. 2015;16 Suppl 1(Suppl 1):S8. doi: 10.1186/1471-2105-16-S1-S8. Epub 2015 Jan 21.
6
iBitter-SCM: Identification and characterization of bitter peptides using a scoring card method with propensity scores of dipeptides.iBitter-SCM:利用二肽倾向评分的评分卡方法鉴定和表征苦味肽。
Genomics. 2020 Jul;112(4):2813-2822. doi: 10.1016/j.ygeno.2020.03.019. Epub 2020 Mar 28.
7
iAMY-SCM: Improved prediction and analysis of amyloid proteins using a scoring card method with propensity scores of dipeptides.iAMY-SCM:使用具有二肽倾向得分的评分卡方法改进淀粉样蛋白的预测与分析
Genomics. 2021 Jan;113(1 Pt 2):689-698. doi: 10.1016/j.ygeno.2020.09.065. Epub 2020 Oct 2.
8
SCMHBP: prediction and analysis of heme binding proteins using propensity scores of dipeptides.SCMHBP:利用二肽倾向得分预测和分析血红素结合蛋白
BMC Bioinformatics. 2014;15 Suppl 16(Suppl 16):S4. doi: 10.1186/1471-2105-15-S16-S4. Epub 2014 Dec 8.
9
iUmami-SCM: A Novel Sequence-Based Predictor for Prediction and Analysis of Umami Peptides Using a Scoring Card Method with Propensity Scores of Dipeptides.iUmami-SCM:一种新颖的基于序列的预测器,用于使用基于二肽倾向分数的评分卡方法预测和分析鲜味肽。
J Chem Inf Model. 2020 Dec 28;60(12):6666-6678. doi: 10.1021/acs.jcim.0c00707. Epub 2020 Oct 23.
10
Interconnection between the protein solubility and amino acid and dipeptide compositions.蛋白质溶解度与氨基酸及二肽组成之间的相互关系。
Protein Pept Lett. 2013 Jan;20(1):88-95.

引用本文的文献

1
PSR-MAPMS: A new approach for the interpretable prediction of myelin autoantigenic peptides in multiple sclerosis using multi-source propensity scores.PSR-MAPMS:一种使用多源倾向评分对多发性硬化症中髓鞘自身抗原肽进行可解释预测的新方法。
Protein Sci. 2025 Aug;34(8):e70010. doi: 10.1002/pro.70010.
2
In silico and structural analysis of Bacillus licheniformis FAO.CP7 pullulanase isolated from cocoa (Theobroma cacao L.) pod waste.从可可(Theobroma cacao L.)豆荚废料中分离出的地衣芽孢杆菌FAO.CP7支链淀粉酶的计算机模拟和结构分析。
BMC Microbiol. 2025 Apr 30;25(1):261. doi: 10.1186/s12866-025-03958-w.
3
ProCeSa: Contrast-Enhanced Structure-Aware Network for Thermostability Prediction with Protein Language Models.

本文引用的文献

1
PROSO II--a new method for protein solubility prediction.PROSO II--一种新的蛋白质溶解度预测方法。
FEBS J. 2012 Jun;279(12):2192-200. doi: 10.1111/j.1742-4658.2012.08603.x. Epub 2012 May 21.
2
Predicting ion channels and their types by the dipeptide mode of pseudo amino acid composition.基于伪氨基酸组成的二肽模式预测离子通道及其类型。
J Theor Biol. 2011 Jan 21;269(1):64-9. doi: 10.1016/j.jtbi.2010.10.019. Epub 2010 Oct 20.
3
Learning to predict expression efficacy of vectors in recombinant protein production.学习预测重组蛋白生产中载体的表达效力。
ProCeSa:用于蛋白质语言模型热稳定性预测的对比增强结构感知网络。
J Chem Inf Model. 2025 Mar 10;65(5):2304-2313. doi: 10.1021/acs.jcim.4c01752. Epub 2025 Feb 23.
4
TCellPredX: A Novel Approach for Accurate Prediction of Hepatitis C Virus Linear T Cell Epitopes.TCellPredX:一种准确预测丙型肝炎病毒线性T细胞表位的新方法。
ACS Omega. 2024 Dec 16;9(52):51494-51507. doi: 10.1021/acsomega.4c08715. eCollection 2024 Dec 31.
5
PLM_Sol: predicting protein solubility by benchmarking multiple protein language models with the updated Escherichia coli protein solubility dataset.PLM_Sol:通过使用更新的大肠杆菌蛋白质可溶性数据集对多个蛋白质语言模型进行基准测试来预测蛋白质可溶性。
Brief Bioinform. 2024 Jul 25;25(5). doi: 10.1093/bib/bbae404.
6
Prediction of matrilineal specific patatin-like protein governing in-vivo maternal haploid induction in maize using support vector machine and di-peptide composition.利用支持向量机和二肽组成预测玉米体内母性单倍体诱导的母系特异性类脂酶蛋白。
Amino Acids. 2024 Mar 9;56(1):20. doi: 10.1007/s00726-023-03368-0.
7
HybridGCN for protein solubility prediction with adaptive weighting of multiple features.用于蛋白质溶解度预测的混合图卷积网络,具有多特征自适应加权
J Cheminform. 2023 Dec 8;15(1):118. doi: 10.1186/s13321-023-00788-8.
8
TROLLOPE: A novel sequence-based stacked approach for the accelerated discovery of linear T-cell epitopes of hepatitis C virus.特罗洛普:一种基于新型序列的堆叠方法,用于加速发现丙型肝炎病毒的线性 T 细胞表位。
PLoS One. 2023 Aug 25;18(8):e0290538. doi: 10.1371/journal.pone.0290538. eCollection 2023.
9
iAMAP-SCM: A Novel Computational Tool for Large-Scale Identification of Antimalarial Peptides Using Estimated Propensity Scores of Dipeptides.iAMAP-SCM:一种利用二肽估计倾向得分大规模鉴定抗疟肽的新型计算工具。
ACS Omega. 2022 Nov 2;7(45):41082-41095. doi: 10.1021/acsomega.2c04465. eCollection 2022 Nov 15.
10
SCMRSA: a New Approach for Identifying and Analyzing Anti-MRSA Peptides Using Estimated Propensity Scores of Dipeptides.SCMRSA:一种利用二肽估计倾向得分鉴定和分析抗耐甲氧西林金黄色葡萄球菌肽的新方法。
ACS Omega. 2022 Sep 1;7(36):32653-32664. doi: 10.1021/acsomega.2c04305. eCollection 2022 Sep 13.
BMC Bioinformatics. 2010 Jan 18;11 Suppl 1(Suppl 1):S21. doi: 10.1186/1471-2105-11-S1-S21.
4
CD-HIT Suite: a web server for clustering and comparing biological sequences.CD-HIT 套件:用于聚类和比较生物序列的网络服务器。
Bioinformatics. 2010 Mar 1;26(5):680-2. doi: 10.1093/bioinformatics/btq003. Epub 2010 Jan 6.
5
Prediction of protein solubility in Escherichia coli using logistic regression.利用逻辑回归预测大肠杆菌中蛋白质的可溶性。
Biotechnol Bioeng. 2010 Feb 1;105(2):374-83. doi: 10.1002/bit.22537.
6
CRYSTALP2: sequence-based protein crystallization propensity prediction.CRYSTALP2:基于序列的蛋白质结晶倾向预测
BMC Struct Biol. 2009 Jul 31;9:50. doi: 10.1186/1472-6807-9-50.
7
SOLpro: accurate sequence-based prediction of protein solubility.SOLpro:基于序列的蛋白质溶解度精确预测
Bioinformatics. 2009 Sep 1;25(17):2200-7. doi: 10.1093/bioinformatics/btp386. Epub 2009 Jun 23.
8
Prediction of protein structural class using novel evolutionary collocation-based sequence representation.使用基于新型进化搭配的序列表示法预测蛋白质结构类别。
J Comput Chem. 2008 Jul 30;29(10):1596-604. doi: 10.1002/jcc.20918.
9
AAindex: amino acid index database, progress report 2008.AAindex:氨基酸索引数据库,2008年进展报告。
Nucleic Acids Res. 2008 Jan;36(Database issue):D202-5. doi: 10.1093/nar/gkm998. Epub 2007 Nov 12.
10
POPI: predicting immunogenicity of MHC class I binding peptides by mining informative physicochemical properties.POPI:通过挖掘信息丰富的物理化学性质预测MHC I类结合肽的免疫原性
Bioinformatics. 2007 Apr 15;23(8):942-9. doi: 10.1093/bioinformatics/btm061. Epub 2007 Mar 24.