Laboratory of DNA Information Analysis, University of Tokyo, Japan.
J Theor Biol. 2013 Mar 7;320:41-6. doi: 10.1016/j.jtbi.2012.12.008. Epub 2012 Dec 13.
Discovering a three dimensional structure of a protein is a challenging task in biological science. Classifying a protein into one of its folds is an intermediate step for deciphering the three dimensional protein structure. The protein fold recognition can be done by developing feature extraction techniques to accurately extract all the relevant information from a protein sequence and then by employing a suitable classifier to label an unknown protein. Several feature extraction techniques have been developed in the past but with limited recognition accuracy only. In this work, we have developed a feature extraction technique which is based on bi-grams computed directly from Position Specific Scoring Matrices and demonstrated its effectiveness on a benchmark dataset. The proposed technique exhibits an absolute improvement of around 10% compared with existing feature extraction techniques.
发现蛋白质的三维结构是生物科学中的一项具有挑战性的任务。将蛋白质分类为其折叠之一是破译三维蛋白质结构的中间步骤。可以通过开发特征提取技术来完成蛋白质折叠识别,从蛋白质序列中准确提取所有相关信息,然后使用合适的分类器对未知蛋白质进行标记。过去已经开发了几种特征提取技术,但识别精度有限。在这项工作中,我们开发了一种基于直接从位置特异性评分矩阵计算的双元组的特征提取技术,并在基准数据集上证明了其有效性。与现有的特征提取技术相比,所提出的技术表现出了约 10%的绝对改进。