Guo Jian, Chen Hu, Sun Zhirong, Lin Yuanlie
Institute of Bioinformatics, State Key Laboratory of Biomembrane and Membrane Biotechnology, Department of Biological Sciences and Biotechnology, Tsinghua University, Beijing, China.
Proteins. 2004 Mar 1;54(4):738-43. doi: 10.1002/prot.10634.
A high-performance method was developed for protein secondary structure prediction based on the dual-layer support vector machine (SVM) and position-specific scoring matrices (PSSMs). SVM is a new machine learning technology that has been successfully applied in solving problems in the field of bioinformatics. The SVM's performance is usually better than that of traditional machine learning approaches. The performance was further improved by combining PSSM profiles with the SVM analysis. The PSSMs were generated from PSI-BLAST profiles, which contain important evolution information. The final prediction results were generated from the second SVM layer output. On the CB513 data set, the three-state overall per-residue accuracy, Q3, reached 75.2%, while segment overlap (SOV) accuracy increased to 80.0%. On the CB396 data set, the Q3 of our method reached 74.0% and the SOV reached 78.1%. A web server utilizing the method has been constructed and is available at http://www.bioinfo.tsinghua.edu.cn/pmsvm.
基于双层支持向量机(SVM)和位置特异性得分矩阵(PSSM),开发了一种用于蛋白质二级结构预测的高性能方法。支持向量机是一种新的机器学习技术,已成功应用于解决生物信息学领域的问题。支持向量机的性能通常优于传统机器学习方法。通过将PSSM轮廓与支持向量机分析相结合,性能得到了进一步提高。PSSM是从PSI-BLAST轮廓生成的,其中包含重要的进化信息。最终预测结果由第二个支持向量机层输出产生。在CB513数据集上,三态整体残基准确率Q3达到75.2%,而片段重叠(SOV)准确率提高到80.0%。在CB396数据集上,我们方法的Q3达到74.0%,SOV达到78.1%。利用该方法构建的网络服务器可在http://www.bioinfo.tsinghua.edu.cn/pmsvm上获取。