Hayat Maqsood, Iqbal Nadeem
Department of Computer Science, Abdul Wali Khan University Mardan, Pakistan.
Department of Computer Science, Abdul Wali Khan University Mardan, Pakistan.
Comput Methods Programs Biomed. 2014 Oct;116(3):184-92. doi: 10.1016/j.cmpb.2014.06.007. Epub 2014 Jun 21.
Proteins control all biological functions in living species. Protein structure is comprised of four major classes including all-α class, all-β class, α+β, and α/β. Each class performs different function according to their nature. Owing to the large exploration of protein sequences in the databanks, the identification of protein structure classes is difficult through conventional methods with respect to cost and time. Looking at the importance of protein structure classes, it is thus highly desirable to develop a computational model for discriminating protein structure classes with high accuracy. For this purpose, we propose a silco method by incorporating Pseudo Average Chemical Shift and Support Vector Machine. Two feature extraction schemes namely Pseudo Amino Acid Composition and Pseudo Average Chemical Shift are used to explore valuable information from protein sequences. The performance of the proposed model is assessed using four benchmark datasets 25PDB, 1189, 640 and 399 employing jackknife test. The success rates of the proposed model are 84.2%, 85.0%, 86.4%, and 89.2%, respectively on the four datasets. The empirical results reveal that the performance of our proposed model compared to existing models is promising in the literature so far and might be useful for future research.
蛋白质控制着生物物种中的所有生物学功能。蛋白质结构由四大类组成,包括全α类、全β类、α + β类和α/β类。每一类根据其性质执行不同的功能。由于数据库中蛋白质序列的大量探索,通过传统方法在成本和时间方面难以识别蛋白质结构类别。鉴于蛋白质结构类别的重要性,因此非常需要开发一种能够高精度区分蛋白质结构类别的计算模型。为此,我们通过结合伪平均化学位移和支持向量机提出了一种silco方法。使用两种特征提取方案,即伪氨基酸组成和伪平均化学位移,从蛋白质序列中探索有价值的信息。使用四个基准数据集25PDB、1189、640和399采用留一法测试来评估所提出模型的性能。所提出模型在这四个数据集上的成功率分别为84.2%、85.0%、86.4%和89.2%。实证结果表明,与现有模型相比,我们所提出模型的性能在目前的文献中很有前景,并且可能对未来的研究有用。