SCPRED：对与预测序列具有模糊相似性的序列的蛋白质结构类别进行准确预测。

SCPRED: accurate prediction of protein structural class for sequences of twilight-zone similarity with predicting sequences.

作者信息

Kurgan Lukasz, Cios Krzysztof, Chen Ke

机构信息

Department of Electrical and Computer Engineering, University of Alberta, ECEFR, 9701 116 Street, Edmonton, AB, T6G 2V4, Canada .

出版信息

BMC Bioinformatics. 2008 May 1;9:226. doi: 10.1186/1471-2105-9-226.

DOI:10.1186/1471-2105-9-226

PMID:18452616

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2391167/

Abstract

BACKGROUND

Protein structure prediction methods provide accurate results when a homologous protein is predicted, while poorer predictions are obtained in the absence of homologous templates. However, some protein chains that share twilight-zone pairwise identity can form similar folds and thus determining structural similarity without the sequence similarity would be desirable for the structure prediction. The folding type of a protein or its domain is defined as the structural class. Current structural class prediction methods that predict the four structural classes defined in SCOP provide up to 63% accuracy for the datasets in which sequence identity of any pair of sequences belongs to the twilight-zone. We propose SCPRED method that improves prediction accuracy for sequences that share twilight-zone pairwise similarity with sequences used for the prediction.

RESULTS

SCPRED uses a support vector machine classifier that takes several custom-designed features as its input to predict the structural classes. Based on extensive design that considers over 2300 index-, composition- and physicochemical properties-based features along with features based on the predicted secondary structure and content, the classifier's input includes 8 features based on information extracted from the secondary structure predicted with PSI-PRED and one feature computed from the sequence. Tests performed with datasets of 1673 protein chains, in which any pair of sequences shares twilight-zone similarity, show that SCPRED obtains 80.3% accuracy when predicting the four SCOP-defined structural classes, which is superior when compared with over a dozen recent competing methods that are based on support vector machine, logistic regression, and ensemble of classifiers predictors.

CONCLUSION

The SCPRED can accurately find similar structures for sequences that share low identity with sequence used for the prediction. The high predictive accuracy achieved by SCPRED is attributed to the design of the features, which are capable of separating the structural classes in spite of their low dimensionality. We also demonstrate that the SCPRED's predictions can be successfully used as a post-processing filter to improve performance of modern fold classification methods.

摘要

背景

蛋白质结构预测方法在预测同源蛋白质时能提供准确结果，而在缺乏同源模板时预测效果较差。然而，一些具有微弱成对序列同一性的蛋白质链可形成相似的折叠结构，因此在结构预测中，若能在无序列相似性的情况下确定结构相似性将是很有必要的。蛋白质或其结构域的折叠类型被定义为结构类别。当前预测SCOP中定义的四种结构类别的结构类别预测方法，对于任何一对序列的序列同一性属于微弱区域的数据集，准确率最高可达63%。我们提出了SCPRED方法，该方法可提高与用于预测的序列具有微弱成对相似性的序列的预测准确率。

结果

SCPRED使用支持向量机分类器，该分类器将几个定制设计的特征作为输入来预测结构类别。基于广泛的设计，该设计考虑了超过2300个基于索引、组成和物理化学性质的特征以及基于预测的二级结构和含量的特征，分类器的输入包括8个基于从PSI-PRED预测的二级结构中提取的信息的特征和一个从序列计算得到的特征。对1673条蛋白质链的数据集进行测试，其中任何一对序列都具有微弱相似性，结果表明，SCPRED在预测SCOP定义的四种结构类别时准确率达到80.3%，与基于支持向量机、逻辑回归和分类器预测器集成的十几种近期竞争方法相比更具优势。

结论

SCPRED能够为与用于预测的序列具有低同一性的序列准确找到相似结构。SCPRED实现的高预测准确率归因于特征的设计，这些特征尽管维度较低，但仍能够区分结构类别。我们还证明，SCPRED的预测可以成功用作后处理过滤器，以提高现代折叠分类方法的性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c11e/2391167/b66b8d073818/1471-2105-9-226-1.jpg

相似文献

SCPRED: accurate prediction of protein structural class for sequences of twilight-zone similarity with predicting sequences.

BMC Bioinformatics. 2008 May 1;9:226. doi: 10.1186/1471-2105-9-226.

Modular prediction of protein structural classes from sequences of twilight-zone identity with predicting sequences.

BMC Bioinformatics. 2009 Dec 13;10:414. doi: 10.1186/1471-2105-10-414.

Prediction of protein structural class for the twilight zone sequences.

Biochem Biophys Res Commun. 2007 Jun 1;357(2):453-60. doi: 10.1016/j.bbrc.2007.03.164. Epub 2007 Apr 5.

PFRES: protein fold classification by using evolutionary information and predicted secondary structure.

Bioinformatics. 2007 Nov 1;23(21):2843-50. doi: 10.1093/bioinformatics/btm475. Epub 2007 Oct 17.

SVM-Fold: a tool for discriminative multi-class protein fold and superfamily recognition.

BMC Bioinformatics. 2007 May 22;8 Suppl 4(Suppl 4):S2. doi: 10.1186/1471-2105-8-S4-S2.

Prediction of protein folds: extraction of new features, dimensionality reduction, and fusion of heterogeneous classifiers.

IEEE Trans Nanobioscience. 2009 Mar;8(1):100-10. doi: 10.1109/TNB.2009.2016488. Epub 2009 Mar 10.

Cross-over between discrete and continuous protein structure space: insights into automatic classification and networks of protein structures.

PLoS Comput Biol. 2009 Mar;5(3):e1000331. doi: 10.1371/journal.pcbi.1000331. Epub 2009 Mar 27.

Prediction of beta-turns at over 80% accuracy based on an ensemble of predicted secondary structures and multiple alignments.

BMC Bioinformatics. 2008 Oct 10;9:430. doi: 10.1186/1471-2105-9-430.

Accuracy of structure-based sequence alignment of automatic methods.

BMC Bioinformatics. 2007 Sep 20;8:355. doi: 10.1186/1471-2105-8-355.

Predicting residue-wise contact orders in proteins by support vector regression.

BMC Bioinformatics. 2006 Oct 3;7:425. doi: 10.1186/1471-2105-7-425.

引用本文的文献

Variant Impact Predictor database (VIPdb), version 2: trends from three decades of genetic variant impact predictors.

Hum Genomics. 2024 Aug 28;18(1):90. doi: 10.1186/s40246-024-00663-z.

Variant Impact Predictor database (VIPdb), version 2: Trends from 25 years of genetic variant impact predictors.

bioRxiv. 2024 Jun 28:2024.06.25.600283. doi: 10.1101/2024.06.25.600283.

Comparative Study on Feature Selection in Protein Structure and Function Prediction.

Comput Math Methods Med. 2022 Oct 11;2022:1650693. doi: 10.1155/2022/1650693. eCollection 2022.

Using Recursive Feature Selection with Random Forest to Improve Protein Structural Class Prediction for Low-Similarity Sequences.

Comput Math Methods Med. 2021 May 7;2021:5529389. doi: 10.1155/2021/5529389. eCollection 2021.

Prediction of protein structural classes by different feature expressions based on 2-D wavelet denoising and fusion.

BMC Bioinformatics. 2019 Dec 24;20(Suppl 25):701. doi: 10.1186/s12859-019-3276-5.

Antimicrobial Resistance Prediction for Gram-Negative Bacteria via Game Theory-Based Feature Evaluation.

Sci Rep. 2019 Oct 9;9(1):14487. doi: 10.1038/s41598-019-50686-z.

VIPdb, a genetic Variant Impact Predictor Database.

Hum Mutat. 2019 Sep;40(9):1202-1214. doi: 10.1002/humu.23858. Epub 2019 Aug 17.

A New Method for Recognizing Cytokines Based on Feature Combination and a Support Vector Machine Classifier.

Molecules. 2018 Aug 11;23(8):2008. doi: 10.3390/molecules23082008.

CIPPN: computational identification of protein pupylation sites by using neural network.

Oncotarget. 2017 Nov 6;8(65):108867-108879. doi: 10.18632/oncotarget.22335. eCollection 2017 Dec 12.

Dataset of eye disease-related proteins analyzed using the unfolding mutation screen.

Sci Data. 2016 Dec 6;3:160112. doi: 10.1038/sdata.2016.112.

本文引用的文献

PFRES: protein fold classification by using evolutionary information and predicted secondary structure.

Bioinformatics. 2007 Nov 1;23(21):2843-50. doi: 10.1093/bioinformatics/btm475. Epub 2007 Oct 17.

Recent progress in protein subcellular location prediction.

Anal Biochem. 2007 Nov 1;370(1):1-16. doi: 10.1016/j.ab.2007.07.006. Epub 2007 Jul 12.

Prediction of protein secondary structure content for the twilight zone sequences.

Proteins. 2007 Nov 15;69(3):486-98. doi: 10.1002/prot.21527.

Prediction of flexible/rigid regions from protein sequences using k-spaced amino acid pairs.

BMC Struct Biol. 2007 Apr 16;7:25. doi: 10.1186/1472-6807-7-25.

Prediction of protein structural class for the twilight zone sequences.

Biochem Biophys Res Commun. 2007 Jun 1;357(2):453-60. doi: 10.1016/j.bbrc.2007.03.164. Epub 2007 Apr 5.

Prediction of protein crystallization using collocation of amino acid pairs.

Biochem Biophys Res Commun. 2007 Apr 13;355(3):764-9. doi: 10.1016/j.bbrc.2007.02.040. Epub 2007 Feb 15.

Progress in computational approach to drug development against SARS.

Curr Med Chem. 2006;13(27):3263-70. doi: 10.2174/092986706778773077.

Retrieval accuracy, statistical significance and compositional similarity in protein sequence database searches.

Nucleic Acids Res. 2006;34(20):5966-73. doi: 10.1093/nar/gkl731. Epub 2006 Oct 26.

Using Bagging classifier to predict protein domain structural class.

J Biomol Struct Dyn. 2006 Dec;24(3):239-42.

Novel hybrid method for the evaluation of parameters contributing in determination of protein structural classes.

J Theor Biol. 2007 Jan 21;244(2):275-81. doi: 10.1016/j.jtbi.2006.08.011. Epub 2006 Aug 22.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

SCPRED：对与预测序列具有模糊相似性的序列的蛋白质结构类别进行准确预测。

SCPRED: accurate prediction of protein structural class for sequences of twilight-zone similarity with predicting sequences.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献