IEEE Trans Nanobioscience. 2014 Mar;13(1):44-50. doi: 10.1109/TNB.2013.2296050.
In biological sciences, the deciphering of a three dimensional structure of a protein sequence is considered to be an important and challenging task. The identification of protein folds from primary protein sequences is an intermediate step in discovering the three dimensional structure of a protein. This can be done by utilizing feature extraction technique to accurately extract all the relevant information followed by employing a suitable classifier to label an unknown protein. In the past, several feature extraction techniques have been developed but with limited recognition accuracy only. In this study, we have developed a feature extraction technique based on tri-grams computed directly from Position Specific Scoring Matrices. The effectiveness of the feature extraction technique has been shown on two benchmark datasets. The proposed technique exhibits up to 4.4% improvement in protein fold recognition accuracy compared to the state-of-the-art feature extraction techniques.
在生物科学领域,破译蛋白质序列的三维结构被认为是一项重要且具有挑战性的任务。从原始蛋白质序列中识别蛋白质折叠是发现蛋白质三维结构的中间步骤。这可以通过利用特征提取技术来准确提取所有相关信息,然后使用合适的分类器来标记未知蛋白质来完成。过去已经开发了几种特征提取技术,但准确性有限。在这项研究中,我们开发了一种基于直接从位置特异性评分矩阵计算的三元组的特征提取技术。该特征提取技术在两个基准数据集上的有效性已经得到了证明。与最先进的特征提取技术相比,该技术在蛋白质折叠识别准确性方面提高了高达 4.4%。