Cheng Haitao, Sen Taner Z, Kloczkowski Andrzej, Margaritis Dimitris, Jernigan Robert L
Department of Biochemistry, Biophysics and Molecular Biology, L. H. Baker Center for Bioinformatics and Biological Statistics, Iowa State University, 112 Office and Laboratory Building, Ames, IA 50011-3020, USA.
Polymer (Guildf). 2005 May 26;46(12):4314-4321. doi: 10.1016/j.polymer.2005.02.040.
A new method for predicting protein secondary structure from amino acid sequence has been developed. The method is based on multiple sequence alignment of the query sequence with all other sequences with known structure from the protein data bank (PDB) by using BLAST. The fragments of the alignments belonging to proteins from the PBD are then used for further analysis. We have studied various schemes of assigning weights for matching segments and calculated normalized scores to predict one of the three secondary structures: α-helix, β-sheet, or coil. We applied several artificial intelligence techniques: decision trees (DT), neural networks (NN) and support vector machines (SVM) to improve the accuracy of predictions and found that SVM gave the best performance. Preliminary data show that combining the fragment mining approach with GOR V (Kloczkowski et al, Proteins 49 (2002) 154-166) for regions of low sequence similarity improves the prediction accuracy.
一种从氨基酸序列预测蛋白质二级结构的新方法已被开发出来。该方法基于通过使用BLAST将查询序列与蛋白质数据库(PDB)中所有已知结构的其他序列进行多序列比对。然后将属于PBD中蛋白质的比对片段用于进一步分析。我们研究了为匹配片段分配权重的各种方案,并计算归一化分数以预测三种二级结构之一:α螺旋、β折叠或卷曲。我们应用了几种人工智能技术:决策树(DT)、神经网络(NN)和支持向量机(SVM)来提高预测的准确性,发现SVM表现最佳。初步数据表明,将片段挖掘方法与GOR V(Kloczkowski等人,《蛋白质》49(2002)154 - 166)相结合用于低序列相似性区域可提高预测准确性。