采用两阶段支持向量机方法预测蛋白质相对溶剂可及性。

Prediction of protein relative solvent accessibility with a two-stage SVM approach.

作者信息

Nguyen Minh N, Rajapakse Jagath C

机构信息

BioInformatics Research Centre, School of Computer Engineering, Nanyang Technological University, Singapore.

出版信息

Proteins. 2005 Apr 1;59(1):30-7. doi: 10.1002/prot.20404.

DOI:10.1002/prot.20404

PMID:15696542

Abstract

Information on relative solvent accessibility (RSA) of amino acid residues in proteins provides valuable clues to the prediction of protein structure and function. A two-stage approach with support vector machines (SVMs) is proposed, where an SVM predictor is introduced to the output of the single-stage SVM approach to take into account the contextual relationships among solvent accessibilities for the prediction. By using the position-specific scoring matrices (PSSMs) generated by PSI-BLAST, the two-stage SVM approach achieves accuracies up to 90.4% and 90.2% on the Manesh data set of 215 protein structures and the RS126 data set of 126 nonhomologous globular proteins, respectively, which are better than the highest published scores on both data sets to date. A Web server for protein RSA prediction using a two-stage SVM method has been developed and is available (http://birc.ntu.edu.sg/~pas0186457/rsa.html).

摘要

蛋白质中氨基酸残基的相对溶剂可及性（RSA）信息为预测蛋白质结构和功能提供了有价值的线索。本文提出了一种基于支持向量机（SVM）的两阶段方法，即在单阶段SVM方法的输出结果基础上引入一个SVM预测器，以考虑溶剂可及性之间的上下文关系进行预测。通过使用PSI-BLAST生成的位置特异性得分矩阵（PSSM），两阶段SVM方法在包含215个蛋白质结构的Manesh数据集和126个非同源球状蛋白质的RS126数据集上分别达到了高达90.4%和90.2%的准确率，这优于迄今为止在这两个数据集上已发表的最高得分。已开发出一个使用两阶段SVM方法进行蛋白质RSA预测的网络服务器，可通过http://birc.ntu.edu.sg/~pas0186457/rsa.html访问。