• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过支持向量回归预测蛋白质中残基水平的接触序。

Predicting residue-wise contact orders in proteins by support vector regression.

作者信息

Song Jiangning, Burrage Kevin

机构信息

Advanced Computational Modelling Centre, The University of Queensland, Brisbane Qld 4072, Australia.

出版信息

BMC Bioinformatics. 2006 Oct 3;7:425. doi: 10.1186/1471-2105-7-425.

DOI:10.1186/1471-2105-7-425
PMID:17014735
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1618864/
Abstract

BACKGROUND

The residue-wise contact order (RWCO) describes the sequence separations between the residues of interest and its contacting residues in a protein sequence. It is a new kind of one-dimensional protein structure that represents the extent of long-range contacts and is considered as a generalization of contact order. Together with secondary structure, accessible surface area, the B factor, and contact number, RWCO provides comprehensive and indispensable important information to reconstructing the protein three-dimensional structure from a set of one-dimensional structural properties. Accurately predicting RWCO values could have many important applications in protein three-dimensional structure prediction and protein folding rate prediction, and give deep insights into protein sequence-structure relationships.

RESULTS

We developed a novel approach to predict residue-wise contact order values in proteins based on support vector regression (SVR), starting from primary amino acid sequences. We explored seven different sequence encoding schemes to examine their effects on the prediction performance, including local sequence in the form of PSI-BLAST profiles, local sequence plus amino acid composition, local sequence plus molecular weight, local sequence plus secondary structure predicted by PSIPRED, local sequence plus molecular weight and amino acid composition, local sequence plus molecular weight and predicted secondary structure, and local sequence plus molecular weight, amino acid composition and predicted secondary structure. When using local sequences with multiple sequence alignments in the form of PSI-BLAST profiles, we could predict the RWCO distribution with a Pearson correlation coefficient (CC) between the predicted and observed RWCO values of 0.55, and root mean square error (RMSE) of 0.82, based on a well-defined dataset with 680 protein sequences. Moreover, by incorporating global features such as molecular weight and amino acid composition we could further improve the prediction performance with the CC to 0.57 and an RMSE of 0.79. In addition, combining the predicted secondary structure by PSIPRED was found to significantly improve the prediction performance and could yield the best prediction accuracy with a CC of 0.60 and RMSE of 0.78, which provided at least comparable performance compared with the other existing methods.

CONCLUSION

The SVR method shows a prediction performance competitive with or at least comparable to the previously developed linear regression-based methods for predicting RWCO values. In contrast to support vector classification (SVC), SVR is very good at estimating the raw value profiles of the samples. The successful application of the SVR approach in this study reinforces the fact that support vector regression is a powerful tool in extracting the protein sequence-structure relationship and in estimating the protein structural profiles from amino acid sequences.

摘要

背景

残基接触序(RWCO)描述了蛋白质序列中感兴趣残基与其接触残基之间的序列间隔。它是一种新型的一维蛋白质结构,代表远程接触的程度,被视为接触序的一种推广。与二级结构、可及表面积、B因子和接触数一起,RWCO为从一组一维结构特性重建蛋白质三维结构提供了全面且不可或缺的重要信息。准确预测RWCO值在蛋白质三维结构预测和蛋白质折叠速率预测中可能有许多重要应用,并能深入洞察蛋白质序列与结构的关系。

结果

我们开发了一种基于支持向量回归(SVR)从一级氨基酸序列预测蛋白质残基接触序值的新方法。我们探索了七种不同的序列编码方案来研究它们对预测性能的影响,包括PSI-BLAST谱形式的局部序列、局部序列加氨基酸组成、局部序列加分子量、局部序列加由PSIPRED预测的二级结构、局部序列加分子量和氨基酸组成、局部序列加分子量和预测的二级结构,以及局部序列加分子量、氨基酸组成和预测的二级结构。当使用PSI-BLAST谱形式的具有多序列比对的局部序列时,基于一个包含680个蛋白质序列的明确数据集,我们可以预测RWCO分布,预测值与观测值之间的皮尔逊相关系数(CC)为0.55,均方根误差(RMSE)为0.82。此外,通过纳入分子量和氨基酸组成等全局特征,我们可以将预测性能进一步提高,CC达到0.57,RMSE为0.79。另外,发现结合PSIPRED预测的二级结构能显著提高预测性能,并且能产生最佳预测精度,CC为0.60,RMSE为0.78,与其他现有方法相比至少具有可比的性能。

结论

SVR方法在预测RWCO值方面表现出与先前开发的基于线性回归的方法具有竞争力或至少相当的预测性能。与支持向量分类(SVC)不同,SVR非常擅长估计样本的原始值分布。SVR方法在本研究中的成功应用强化了这样一个事实,即支持向量回归是提取蛋白质序列与结构关系以及从氨基酸序列估计蛋白质结构分布的强大工具。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e611/1618864/e469273826c2/1471-2105-7-425-7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e611/1618864/64d0dae10638/1471-2105-7-425-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e611/1618864/462a55b2c426/1471-2105-7-425-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e611/1618864/9412498931c5/1471-2105-7-425-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e611/1618864/467494d79932/1471-2105-7-425-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e611/1618864/567a3514bd22/1471-2105-7-425-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e611/1618864/aa1f0e35730e/1471-2105-7-425-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e611/1618864/e469273826c2/1471-2105-7-425-7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e611/1618864/64d0dae10638/1471-2105-7-425-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e611/1618864/462a55b2c426/1471-2105-7-425-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e611/1618864/9412498931c5/1471-2105-7-425-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e611/1618864/467494d79932/1471-2105-7-425-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e611/1618864/567a3514bd22/1471-2105-7-425-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e611/1618864/aa1f0e35730e/1471-2105-7-425-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e611/1618864/e469273826c2/1471-2105-7-425-7.jpg

相似文献

1
Predicting residue-wise contact orders in proteins by support vector regression.通过支持向量回归预测蛋白质中残基水平的接触序。
BMC Bioinformatics. 2006 Oct 3;7:425. doi: 10.1186/1471-2105-7-425.
2
Prediction of cis/trans isomerization in proteins using PSI-BLAST profiles and secondary structure information.利用PSI-BLAST谱和二级结构信息预测蛋白质中的顺/反异构化
BMC Bioinformatics. 2006 Mar 9;7:124. doi: 10.1186/1471-2105-7-124.
3
Better prediction of protein contact number using a support vector regression analysis of amino acid sequence.利用氨基酸序列的支持向量回归分析对蛋白质接触数进行更好的预测。
BMC Bioinformatics. 2005 Oct 13;6:248. doi: 10.1186/1471-2105-6-248.
4
Predicting disulfide connectivity from protein sequence using multiple sequence feature vectors and secondary structure.使用多序列特征向量和二级结构从蛋白质序列预测二硫键连接性。
Bioinformatics. 2007 Dec 1;23(23):3147-54. doi: 10.1093/bioinformatics/btm505. Epub 2007 Oct 17.
5
Improved residue contact prediction using support vector machines and a large feature set.使用支持向量机和大量特征集改进残基接触预测。
BMC Bioinformatics. 2007 Apr 2;8:113. doi: 10.1186/1471-2105-8-113.
6
Support Vector Machine-based classification of protein folds using the structural properties of amino acid residues and amino acid residue pairs.基于支持向量机,利用氨基酸残基和氨基酸残基对的结构特性对蛋白质折叠进行分类。
Bioinformatics. 2007 Dec 15;23(24):3320-7. doi: 10.1093/bioinformatics/btm527. Epub 2007 Nov 7.
7
A comprehensive assessment of sequence-based and template-based methods for protein contact prediction.基于序列和基于模板的蛋白质接触预测方法的综合评估。
Bioinformatics. 2008 Apr 1;24(7):924-31. doi: 10.1093/bioinformatics/btn069. Epub 2008 Feb 22.
8
Sequence based residue depth prediction using evolutionary information and predicted secondary structure.基于序列的残基深度预测,利用进化信息和预测的二级结构。
BMC Bioinformatics. 2008 Sep 20;9:388. doi: 10.1186/1471-2105-9-388.
9
HSEpred: predict half-sphere exposure from protein sequences.HSEpred:从蛋白质序列预测半球暴露情况。
Bioinformatics. 2008 Jul 1;24(13):1489-97. doi: 10.1093/bioinformatics/btn222. Epub 2008 May 8.
10
Prodepth: predict residue depth by support vector regression approach from protein sequences only.Prodepth:仅从蛋白质序列通过支持向量回归方法预测残基深度。
PLoS One. 2009 Sep 17;4(9):e7072. doi: 10.1371/journal.pone.0007072.

引用本文的文献

1
HBPred: a tool to identify growth hormone-binding proteins.HBPred:一种识别生长激素结合蛋白的工具。
Int J Biol Sci. 2018 May 22;14(8):957-964. doi: 10.7150/ijbs.24174. eCollection 2018.
2
Periscope: quantitative prediction of soluble protein expression in the periplasm of Escherichia coli.潜望镜:大肠杆菌周质中可溶性蛋白质表达的定量预测
Sci Rep. 2016 Mar 2;6:21844. doi: 10.1038/srep21844.
3
PROSPER: an integrated feature-based tool for predicting protease substrate cleavage sites.PROSPER:一种基于综合特征的蛋白酶底物切割位点预测工具。

本文引用的文献

1
Predicting secondary structures, contact numbers, and residue-wise contact orders of native protein structures from amino acid sequences using critical random networks.使用临界随机网络从氨基酸序列预测天然蛋白质结构的二级结构、接触数和残基水平的接触序。
Biophysics (Nagoya-shi). 2005 Nov 22;1:67-74. doi: 10.2142/biophysics.1.67. eCollection 2005.
2
Quantitative prediction of mouse class I MHC peptide binding affinity using support vector machine regression (SVR) models.使用支持向量机回归(SVR)模型对小鼠I类主要组织相容性复合体肽结合亲和力进行定量预测。
BMC Bioinformatics. 2006 Mar 31;7:182. doi: 10.1186/1471-2105-7-182.
3
PLoS One. 2012;7(11):e50300. doi: 10.1371/journal.pone.0050300. Epub 2012 Nov 29.
4
TANGLE: two-level support vector regression approach for protein backbone torsion angle prediction from primary sequences.TANGLE:一种两级支持向量回归方法,用于从蛋白质一级序列预测蛋白质主链扭转角。
PLoS One. 2012;7(2):e30361. doi: 10.1371/journal.pone.0030361. Epub 2012 Feb 2.
5
SVM-based prediction of linear B-cell epitopes using Bayes Feature Extraction.基于支持向量机的贝叶斯特征提取线性 B 细胞表位预测。
BMC Genomics. 2010 Dec 2;11 Suppl 4(Suppl 4):S21. doi: 10.1186/1471-2164-11-S4-S21.
6
Predicting changes in protein thermostability brought about by single- or multi-site mutations.预测单一位点或多位点突变引起的蛋白质热稳定性变化。
BMC Bioinformatics. 2010 Jul 2;11:370. doi: 10.1186/1471-2105-11-370.
7
svmPRAT: SVM-based protein residue annotation toolkit.基于 SVM 的蛋白质残基注释工具包。
BMC Bioinformatics. 2009 Dec 22;10:439. doi: 10.1186/1471-2105-10-439.
8
Modular prediction of protein structural classes from sequences of twilight-zone identity with predicting sequences.从与预测序列具有 twilight-zone 身份的序列中预测蛋白质结构类别
BMC Bioinformatics. 2009 Dec 13;10:414. doi: 10.1186/1471-2105-10-414.
9
Prodepth: predict residue depth by support vector regression approach from protein sequences only.Prodepth:仅从蛋白质序列通过支持向量回归方法预测残基深度。
PLoS One. 2009 Sep 17;4(9):e7072. doi: 10.1371/journal.pone.0007072.
10
Residue contact-count potentials are as effective as residue-residue contact-type potentials for ranking protein decoys.残基接触计数势在对蛋白质诱饵进行排序方面与残基-残基接触类型势一样有效。
BMC Struct Biol. 2008 Dec 8;8:53. doi: 10.1186/1472-6807-8-53.
Prediction of folding rates of small proteins: empirical relations based on length, secondary structure content, residue type, and stability.
小蛋白质折叠速率的预测:基于长度、二级结构含量、残基类型和稳定性的经验关系。
Biochemistry. 2006 Mar 21;45(11):3805-12. doi: 10.1021/bi0521137.
4
Prediction of cis/trans isomerization in proteins using PSI-BLAST profiles and secondary structure information.利用PSI-BLAST谱和二级结构信息预测蛋白质中的顺/反异构化
BMC Bioinformatics. 2006 Mar 9;7:124. doi: 10.1186/1471-2105-7-124.
5
Missing value estimation for DNA microarray gene expression data by Support Vector Regression imputation and orthogonal coding scheme.基于支持向量回归插补和正交编码方案的DNA微阵列基因表达数据缺失值估计
BMC Bioinformatics. 2006 Jan 22;7:32. doi: 10.1186/1471-2105-7-32.
6
Improving disulfide connectivity prediction with sequential distance between oxidized cysteines.利用氧化半胱氨酸之间的序列距离改进二硫键连接预测。
Bioinformatics. 2005 Dec 15;21(24):4416-9. doi: 10.1093/bioinformatics/bti715. Epub 2005 Oct 13.
7
Better prediction of protein contact number using a support vector regression analysis of amino acid sequence.利用氨基酸序列的支持向量回归分析对蛋白质接触数进行更好的预测。
BMC Bioinformatics. 2005 Oct 13;6:248. doi: 10.1186/1471-2105-6-248.
8
Protein subcellular localization prediction for Gram-negative bacteria using amino acid subalphabets and a combination of multiple support vector machines.利用氨基酸子字母表和多个支持向量机组合对革兰氏阴性菌进行蛋白质亚细胞定位预测
BMC Bioinformatics. 2005 Jul 13;6:174. doi: 10.1186/1471-2105-6-174.
9
The effect of long-range interactions on the secondary structure formation of proteins.长程相互作用对蛋白质二级结构形成的影响。
Protein Sci. 2005 Aug;14(8):1955-63. doi: 10.1110/ps.051479505. Epub 2005 Jun 29.
10
pSLIP: SVM based protein subcellular localization prediction using multiple physicochemical properties.pSLIP:基于支持向量机并利用多种物理化学性质进行蛋白质亚细胞定位预测
BMC Bioinformatics. 2005 Jun 17;6:152. doi: 10.1186/1471-2105-6-152.