Shen Hong-Bin, Yang Jie, Liu Xiao-Jun, Chou Kuo-Chen
Institute of Image Processing and Pattern Recognition, Shanghai Jiaotong University, Shanghai 200030, China.
Biochem Biophys Res Commun. 2005 Aug 26;334(2):577-81. doi: 10.1016/j.bbrc.2005.06.128.
Prediction of protein classification is both an important and a tempting topic in protein science. This is because of not only that the knowledge thus obtained can provide useful information about the overall structure of a query protein, but also that the practice itself can technically stimulate the development of novel predictors that may be straightforwardly applied to many other relevant areas. In this paper, a novel approach, the so-called "supervised fuzzy clustering approach" is introduced that is featured by utilizing the class label information during the training process. Based on such an approach, a set of "if-then" fuzzy rules for predicting the protein structural classes are extracted from a training dataset. It has been demonstrated through two different working datasets that the overall success prediction rates obtained by the supervised fuzzy clustering approach are all higher than those by the unsupervised fuzzy c-means introduced by the previous investigators [C.T. Zhang, K.C. Chou, G.M. Maggiora. Protein Eng. (1995) 8, 425-435]. It is anticipated that the current predictor may play an important complementary role to other existing predictors in this area to further strengthen the power in predicting the structural classes of proteins and their other characteristic attributes.
蛋白质分类预测是蛋白质科学中一个重要且诱人的课题。这不仅是因为由此获得的知识能够为查询蛋白质的整体结构提供有用信息,还因为该实践本身在技术上能够推动新型预测器的发展,而这些预测器可直接应用于许多其他相关领域。本文介绍了一种新颖的方法,即所谓的“监督模糊聚类方法”,其特点是在训练过程中利用类别标签信息。基于这种方法,从训练数据集中提取了一组用于预测蛋白质结构类别的“如果-那么”模糊规则。通过两个不同的工作数据集已经证明,监督模糊聚类方法获得的总体成功预测率均高于先前研究者引入的无监督模糊c均值方法[C.T. 张,K.C. 周,G.M. 马焦拉。蛋白质工程(1995年)8卷,425 - 435页]。预计当前的预测器在该领域可能会对其他现有预测器起到重要的补充作用,以进一步增强预测蛋白质结构类别及其其他特征属性的能力。