Raghuraj Rao, Lakshminarayanan S
Department of Chemical and Biomolecular Engineering, National University of Singapore, 4 Engineering Drive 4, Singapore 117576, Singapore.
Comput Biol Chem. 2008 Aug;32(4):302-6. doi: 10.1016/j.compbiolchem.2008.03.009. Epub 2008 Apr 1.
Variable predictive model based class discrimination (VPMCD) algorithm is proposed as an effective protein secondary structure classification tool. The algorithm mathematically represents the characteristics amino acid interactions specific to each protein structure and exploits them further to distinguish different structures. The new concept and the VPMCD classifier are established using well-studied datasets containing four protein classes as benchmark. The protein samples selected from SCOP and PDB databases with varying homology (25-100%) and non-uniform distribution of class samples provide challenging classification problem. The performance of the new method is compared with advanced classification algorithms like component coupled, SVM and neural networks. VPMCD provides superior performance for high homology datasets. 100% classification is achieved for self-consistency test and an improvement of 5% prediction accuracy is obtained during Jackknife test. The sensitivity of the new algorithm is investigated by varying model structures/types and sequence homology. Simpler to implement VPMCD algorithm is observed to be a robust classification technique and shows potential for effective extensions to other clinical diagnosis and data mining applications in biological systems.
基于可变预测模型的类判别(VPMCD)算法被提出作为一种有效的蛋白质二级结构分类工具。该算法以数学方式表示每种蛋白质结构特有的特征氨基酸相互作用,并进一步利用这些相互作用来区分不同的结构。使用包含四类蛋白质的经过充分研究的数据集作为基准,建立了新概念和VPMCD分类器。从SCOP和PDB数据库中选择的具有不同同源性(25%-100%)且类样本分布不均匀的蛋白质样本带来了具有挑战性的分类问题。将新方法的性能与诸如组件耦合、支持向量机和神经网络等先进分类算法进行了比较。VPMCD在高同源性数据集上表现出卓越性能。自一致性测试实现了100%的分类,留一法测试期间预测准确率提高了5%。通过改变模型结构/类型和序列同源性来研究新算法的敏感性。观察到更易于实现的VPMCD算法是一种强大的分类技术,并且在生物系统中的其他临床诊断和数据挖掘应用的有效扩展方面显示出潜力。