Lyons James, Biswas Neela, Sharma Alok, Dehzangi Abdollah, Paliwal Kuldip K
School of Engineering, Griffith University, Australia.
Royal Brisbane and Women's Hospital, Brisbane, Australia.
J Theor Biol. 2014 Aug 7;354:137-45. doi: 10.1016/j.jtbi.2014.03.033. Epub 2014 Mar 31.
In protein fold recognition, a protein is classified into one of its folds. The recognition of a protein fold can be done by employing feature extraction methods to extract relevant information from protein sequences and then by using a classifier to accurately recognize novel protein sequences. In the past, several feature extraction methods have been developed but with limited recognition accuracy only. Protein sequences of varying lengths share the same fold and therefore they are very similar (in a fold) if aligned properly. To this, we develop an amino acid alignment method to extract important features from protein sequences by computing dissimilarity distances between proteins. This is done by measuring distance between two respective position specific scoring matrices of protein sequences which is used in a support vector machine framework. We demonstrated the effectiveness of the proposed method on several benchmark datasets. The method shows significant improvement in the fold recognition performance which is in the range of 4.3-7.6% compared to several other existing feature extraction methods.
在蛋白质折叠识别中,一种蛋白质被归类到其折叠类型之一。蛋白质折叠的识别可以通过采用特征提取方法从蛋白质序列中提取相关信息,然后使用分类器来准确识别新的蛋白质序列来实现。过去,已经开发了几种特征提取方法,但识别准确率有限。不同长度的蛋白质序列具有相同的折叠,因此如果正确对齐,它们(在一个折叠中)非常相似。为此,我们开发了一种氨基酸比对方法,通过计算蛋白质之间的差异距离从蛋白质序列中提取重要特征。这是通过测量蛋白质序列的两个各自的位置特异性评分矩阵之间的距离来完成的,该距离用于支持向量机框架。我们在几个基准数据集上证明了所提出方法的有效性。与其他几种现有的特征提取方法相比,该方法在折叠识别性能上有显著提高,提高幅度在4.3 - 7.6%之间。