Zhang C T, Chou K C, Maggiora G M
Department of Physics, Tianjin University, China.
Protein Eng. 1995 May;8(5):425-35. doi: 10.1093/protein/8.5.425.
Most globular proteins can be classified into one of four structural classes--all-alpha, all-beta, alpha + beta and alpha/beta--depending upon the type, amount and arrangement of secondary structures present. In this work a new method, based upon fuzzy clustering, is proposed for predicting the structural class of a protein from its amino acid composition. Here, each of the structural classes is described by a fuzzy cluster and each protein is characterized by its membership degree, a number between zero and one in each of the four clusters, with the constraint that the sum of the membership degrees equals unity. A given protein is then classified as belonging to that structural class corresponding to the fuzzy cluster with maximum membership degree. Calculation of membership degrees is carried out using the fuzzy c-means algorithm on a training set of 64 proteins. Results obtained for the training set show that the fuzzy clustering approach produces results comparable with or better than those obtained by other methods. A test set of 27 proteins also produced comparable results to those obtained with the training set. The success of the present preliminary work on protein structure class prediction suggests that further refinements of method may lead to improved predictions and this is currently being investigated.
大多数球状蛋白质可根据所含二级结构的类型、数量和排列分为四种结构类别之一——全α结构、全β结构、α + β结构和α/β结构。在这项工作中,提出了一种基于模糊聚类的新方法,用于从蛋白质的氨基酸组成预测其结构类别。在这里,每个结构类别由一个模糊聚类描述,每个蛋白质由其隶属度表征,即在四个聚类中每个聚类的隶属度是一个介于0和1之间的数,且隶属度之和等于1。然后将给定蛋白质分类为属于具有最大隶属度的模糊聚类对应的结构类别。使用模糊c均值算法对64种蛋白质的训练集进行隶属度计算。训练集获得的结果表明,模糊聚类方法产生的结果与其他方法相当或更好。27种蛋白质的测试集也产生了与训练集相当的结果。目前关于蛋白质结构类别预测的这项初步工作的成功表明,方法的进一步改进可能会带来更好的预测,目前正在对此进行研究。