Angadi Ulavappa B, Venkatesulu M
Department of Computer Applications, Kalasalingam University, Krishnankoil, Srivilliputtur (via), Tamil Nadu, 626190, India.
J Bioinform Comput Biol. 2010 Oct;8(5):825-41. doi: 10.1142/s0219720010004951.
One of the major research directions in bioinformatics is that of predicting the protein superfamily in large databases and classifying a given set of protein domains into superfamilies. The classification reflects the structural, evolutionary and functional relatedness. These relationships are embodied in hierarchical classification such as Structural Classification of Protein (SCOP), which is manually curated. Such classification is essential for the structural and functional analysis of proteins. Yet, a large number of proteins remain unclassified. We have proposed an unsupervised machine-learning FuzzyART neural network algorithm to classify a given set of proteins into SCOP superfamilies. The proposed method is fast learning and uses an atypical non-linear pattern recognition technique. In this approach, we have constructed a similarity matrix from p-values of BLAST all-against-all, trained the network with FuzzyART unsupervised learning algorithm using the similarity matrix as input vectors and finally the trained network offers SCOP superfamily level classification. In this experiment, we have evaluated the performance of our method with existing techniques on six different datasets. We have shown that the trained network is able to classify a given similarity matrix of a set of sequences into SCOP superfamilies at high classification accuracy.
生物信息学的主要研究方向之一是在大型数据库中预测蛋白质超家族,并将给定的一组蛋白质结构域分类为超家族。这种分类反映了结构、进化和功能上的相关性。这些关系体现在诸如蛋白质结构分类(SCOP)这样的层次分类中,SCOP是人工整理的。这种分类对于蛋白质的结构和功能分析至关重要。然而,仍有大量蛋白质未被分类。我们提出了一种无监督机器学习模糊ART神经网络算法,将给定的一组蛋白质分类为SCOP超家族。所提出的方法学习速度快,并使用一种非典型的非线性模式识别技术。在这种方法中,我们从BLAST全对全的p值构建了一个相似性矩阵,使用该相似性矩阵作为输入向量,通过模糊ART无监督学习算法训练网络,最后训练好的网络提供SCOP超家族水平的分类。在这个实验中,我们使用六种不同的数据集,用现有技术评估了我们方法的性能。我们已经表明,训练好的网络能够以高分类准确率将一组序列的给定相似性矩阵分类为SCOP超家族。