Lyons James, Paliwal Kuldip K, Dehzangi Abdollah, Heffernan Rhys, Tsunoda Tatsuhiko, Sharma Alok
School of Engineering, Griffith University, Brisbane, QLD 4111, Australia.
University of Iowa, USA.
J Theor Biol. 2016 Mar 21;393:67-74. doi: 10.1016/j.jtbi.2015.12.018. Epub 2016 Jan 19.
Detecting three dimensional structures of protein sequences is a challenging task in biological sciences. For this purpose, protein fold recognition has been utilized as an intermediate step which helps in classifying a novel protein sequence into one of its folds. The process of protein fold recognition encompasses feature extraction of protein sequences and feature identification through suitable classifiers. Several feature extractors are developed to retrieve useful information from protein sequences. These features are generally extracted by constituting protein's sequential, physicochemical and evolutionary properties. The performance in terms of recognition accuracy has also been gradually improved over the last decade. However, it is yet to reach a well reasonable and accepted level. In this work, we first applied HMM-HMM alignment of protein sequence from HHblits to extract profile HMM (PHMM) matrix. Then we computed the distance between respective PHMM matrices using kernalized dynamic programming. We have recorded significant improvement in fold recognition over the state-of-the-art feature extractors. The improvement of recognition accuracy is in the range of 2.7-11.6% when experimented on three benchmark datasets from Structural Classification of Proteins.
检测蛋白质序列的三维结构是生物科学中的一项具有挑战性的任务。为此,蛋白质折叠识别已被用作中间步骤,有助于将新的蛋白质序列分类到其折叠类型之一中。蛋白质折叠识别过程包括蛋白质序列的特征提取和通过合适的分类器进行特征识别。已经开发了几种特征提取器来从蛋白质序列中检索有用信息。这些特征通常通过构建蛋白质的序列、物理化学和进化特性来提取。在过去十年中,识别准确率方面的性能也在逐步提高。然而,它尚未达到一个合理且被广泛接受的水平。在这项工作中,我们首先应用来自HHblits的蛋白质序列的HMM - HMM比对来提取轮廓HMM(PHMM)矩阵。然后我们使用核动态规划计算各个PHMM矩阵之间的距离。我们记录到与最先进的特征提取器相比,折叠识别有显著改进。在来自蛋白质结构分类的三个基准数据集上进行实验时,识别准确率的提高范围在2.7 - 11.6%之间。