Gubbi Jayavardhana, Shilton Alistair, Parker Michael, Palaniswami Marimuthu
Department of Electrical and Electronics Engineering, The University of Melbourne, Parkville, Victoria 3010, Australia.
Genome Inform. 2006;17(2):259-69.
The determination of the first 3-D model of a protein from its sequence alone is a non-trivial problem. The first 3-D model is the key to the molecular replacement method of solving phase problem in x-ray crystallography. If the sequence identity is more than 30%, homology modelling can be used to determine the correct topology (as defined by CATH) or fold (as defined by SCOP). If the sequence identity is less than 25%, however, the task is very challenging. In this paper we address the topology classification of proteins with sequence identity of less than 25%. The input information to the system is amino acid sequence, the predicted secondary structure and the predicted real value relative solvent accessibility. A two stage support vector machine (SVM) approach is proposed for classifying the sequences to three different structural classes (alpha, beta, alpha+beta) in the first stage and 39 topologies in the second stage. The method is evaluated using a newly curated dataset from CATH with maximum pairwise sequence identity less than 25%. An impressive overall accuracy of 87.44% and 83.15% is reported for class and topology prediction, respectively. In the class prediction stage, a sensitivity of 0.77 and a specificity of 0.91 is obtained. Data file, SVM implementation (SVMHEAVY) and result files can be downloaded from http://www.ee.unimelb.edu.au/ISSNIP/downloads/.
仅根据蛋白质序列确定其首个三维模型是一个复杂的问题。首个三维模型是X射线晶体学中解决相位问题的分子置换方法的关键。如果序列同一性超过30%,则可以使用同源建模来确定正确的拓扑结构(由CATH定义)或折叠方式(由SCOP定义)。然而,如果序列同一性小于25%,这项任务就极具挑战性。在本文中,我们探讨了序列同一性小于25%的蛋白质的拓扑分类问题。系统的输入信息是氨基酸序列、预测的二级结构和预测的相对溶剂可及性实值。提出了一种两阶段支持向量机(SVM)方法,在第一阶段将序列分类为三种不同的结构类别(α、β、α+β),在第二阶段分类为39种拓扑结构。使用来自CATH的一个新整理的数据集对该方法进行评估,该数据集的最大成对序列同一性小于25%。据报道,在类别和拓扑预测方面,总体准确率分别达到了令人印象深刻的87.44%和83.15%。在类别预测阶段,灵敏度为0.77,特异性为0.91。数据文件、SVM实现(SVMHEAVY)和结果文件可从http://www.ee.unimelb.edu.au/ISSNIP/downloads/下载。