Suppr超能文献

使用隐马尔可夫模型-隐马尔可夫模型比对和动态规划进行蛋白质折叠识别。

Protein fold recognition using HMM-HMM alignment and dynamic programming.

作者信息

Lyons James, Paliwal Kuldip K, Dehzangi Abdollah, Heffernan Rhys, Tsunoda Tatsuhiko, Sharma Alok

机构信息

School of Engineering, Griffith University, Brisbane, QLD 4111, Australia.

University of Iowa, USA.

出版信息

J Theor Biol. 2016 Mar 21;393:67-74. doi: 10.1016/j.jtbi.2015.12.018. Epub 2016 Jan 19.

Abstract

Detecting three dimensional structures of protein sequences is a challenging task in biological sciences. For this purpose, protein fold recognition has been utilized as an intermediate step which helps in classifying a novel protein sequence into one of its folds. The process of protein fold recognition encompasses feature extraction of protein sequences and feature identification through suitable classifiers. Several feature extractors are developed to retrieve useful information from protein sequences. These features are generally extracted by constituting protein's sequential, physicochemical and evolutionary properties. The performance in terms of recognition accuracy has also been gradually improved over the last decade. However, it is yet to reach a well reasonable and accepted level. In this work, we first applied HMM-HMM alignment of protein sequence from HHblits to extract profile HMM (PHMM) matrix. Then we computed the distance between respective PHMM matrices using kernalized dynamic programming. We have recorded significant improvement in fold recognition over the state-of-the-art feature extractors. The improvement of recognition accuracy is in the range of 2.7-11.6% when experimented on three benchmark datasets from Structural Classification of Proteins.

摘要

检测蛋白质序列的三维结构是生物科学中的一项具有挑战性的任务。为此,蛋白质折叠识别已被用作中间步骤,有助于将新的蛋白质序列分类到其折叠类型之一中。蛋白质折叠识别过程包括蛋白质序列的特征提取和通过合适的分类器进行特征识别。已经开发了几种特征提取器来从蛋白质序列中检索有用信息。这些特征通常通过构建蛋白质的序列、物理化学和进化特性来提取。在过去十年中,识别准确率方面的性能也在逐步提高。然而,它尚未达到一个合理且被广泛接受的水平。在这项工作中,我们首先应用来自HHblits的蛋白质序列的HMM - HMM比对来提取轮廓HMM(PHMM)矩阵。然后我们使用核动态规划计算各个PHMM矩阵之间的距离。我们记录到与最先进的特征提取器相比,折叠识别有显著改进。在来自蛋白质结构分类的三个基准数据集上进行实验时,识别准确率的提高范围在2.7 - 11.6%之间。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验