Suppr超能文献

SCPRED:对与预测序列具有模糊相似性的序列的蛋白质结构类别进行准确预测。

SCPRED: accurate prediction of protein structural class for sequences of twilight-zone similarity with predicting sequences.

作者信息

Kurgan Lukasz, Cios Krzysztof, Chen Ke

机构信息

Department of Electrical and Computer Engineering, University of Alberta, ECEFR, 9701 116 Street, Edmonton, AB, T6G 2V4, Canada .

出版信息

BMC Bioinformatics. 2008 May 1;9:226. doi: 10.1186/1471-2105-9-226.

Abstract

BACKGROUND

Protein structure prediction methods provide accurate results when a homologous protein is predicted, while poorer predictions are obtained in the absence of homologous templates. However, some protein chains that share twilight-zone pairwise identity can form similar folds and thus determining structural similarity without the sequence similarity would be desirable for the structure prediction. The folding type of a protein or its domain is defined as the structural class. Current structural class prediction methods that predict the four structural classes defined in SCOP provide up to 63% accuracy for the datasets in which sequence identity of any pair of sequences belongs to the twilight-zone. We propose SCPRED method that improves prediction accuracy for sequences that share twilight-zone pairwise similarity with sequences used for the prediction.

RESULTS

SCPRED uses a support vector machine classifier that takes several custom-designed features as its input to predict the structural classes. Based on extensive design that considers over 2300 index-, composition- and physicochemical properties-based features along with features based on the predicted secondary structure and content, the classifier's input includes 8 features based on information extracted from the secondary structure predicted with PSI-PRED and one feature computed from the sequence. Tests performed with datasets of 1673 protein chains, in which any pair of sequences shares twilight-zone similarity, show that SCPRED obtains 80.3% accuracy when predicting the four SCOP-defined structural classes, which is superior when compared with over a dozen recent competing methods that are based on support vector machine, logistic regression, and ensemble of classifiers predictors.

CONCLUSION

The SCPRED can accurately find similar structures for sequences that share low identity with sequence used for the prediction. The high predictive accuracy achieved by SCPRED is attributed to the design of the features, which are capable of separating the structural classes in spite of their low dimensionality. We also demonstrate that the SCPRED's predictions can be successfully used as a post-processing filter to improve performance of modern fold classification methods.

摘要

背景

蛋白质结构预测方法在预测同源蛋白质时能提供准确结果,而在缺乏同源模板时预测效果较差。然而,一些具有微弱成对序列同一性的蛋白质链可形成相似的折叠结构,因此在结构预测中,若能在无序列相似性的情况下确定结构相似性将是很有必要的。蛋白质或其结构域的折叠类型被定义为结构类别。当前预测SCOP中定义的四种结构类别的结构类别预测方法,对于任何一对序列的序列同一性属于微弱区域的数据集,准确率最高可达63%。我们提出了SCPRED方法,该方法可提高与用于预测的序列具有微弱成对相似性的序列的预测准确率。

结果

SCPRED使用支持向量机分类器,该分类器将几个定制设计的特征作为输入来预测结构类别。基于广泛的设计,该设计考虑了超过2300个基于索引、组成和物理化学性质的特征以及基于预测的二级结构和含量的特征,分类器的输入包括8个基于从PSI-PRED预测的二级结构中提取的信息的特征和一个从序列计算得到的特征。对1673条蛋白质链的数据集进行测试,其中任何一对序列都具有微弱相似性,结果表明,SCPRED在预测SCOP定义的四种结构类别时准确率达到80.3%,与基于支持向量机、逻辑回归和分类器预测器集成的十几种近期竞争方法相比更具优势。

结论

SCPRED能够为与用于预测的序列具有低同一性的序列准确找到相似结构。SCPRED实现的高预测准确率归因于特征的设计,这些特征尽管维度较低,但仍能够区分结构类别。我们还证明,SCPRED的预测可以成功用作后处理过滤器,以提高现代折叠分类方法的性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c11e/2391167/b66b8d073818/1471-2105-9-226-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验