Panda Bishnupriya, Majhi Babita, Thakur Abhimanyu
Department of Computer Science and Engineering, Institute of Technical Education and Research, Siksha 'O' Anusandhan University, Bhubaneswar, Orissa, India.
Department of Computer Science and Information Technology, Guru Ghashidas Vishwavidyalaya (A Central University), Bilaspur, Chhattisgarh, India.
Curr Comput Aided Drug Des. 2019;15(1):45-54. doi: 10.2174/1573409914666180828105228.
Proteins are the utmost multi-purpose macromolecules, which play a crucial function in many aspects of biological processes. For a long time, sequence arrangement of amino acid has been utilized for the prediction of protein secondary structure. Besides, in major methods for the prediction of protein secondary structure class, the impact of Gaussian noise on sequence representation of amino acids has not been considered until now; which is one of the important constraints for the functionality of a protein.
In the present research, the prediction of protein secondary structure class was accomplished by integrated application of Stockwell transformation and Amino Acid Composition (AAC), on equivalent Electron-ion Interaction Potential (EIIP) representation of raw amino acid sequence. The introduced method was evaluated by using 4 benchmark datasets of low sequence homology, namely PDB25, 498, 277, and 204. Furthermore, random forest algorithm together with the out-of-bag error estimate and Support Vector Machine (SVM), using k-fold cross validation demonstrated high feature representation potential of our reported approach.
The overall prediction accuracy for PDB25, 498, 277, and 204 datasets with randomforest classifier was 92.5%, 94.79%, 92.45%, and 88.04% respectively, whereas with SVM, the results were 84.66%, 95.32%, 89.29%, and 84.37% respectively.
An integrated-order-function-frequency-time (OFFT) model has been proposed for the prediction of protein secondary structure class. For the first time, we reported the effect of Gaussian noise on the prediction accuracy of protein secondary structure class and proposed a robust integrated- OFFT model, which is effectively noise resistant.
蛋白质是用途最为广泛的大分子,在生物过程的许多方面发挥着关键作用。长期以来,氨基酸序列排列一直被用于预测蛋白质二级结构。此外,在蛋白质二级结构类别的主要预测方法中,高斯噪声对氨基酸序列表示的影响至今尚未得到考虑;而这是蛋白质功能的重要限制因素之一。
在本研究中,通过将斯托克韦尔变换与氨基酸组成(AAC)综合应用于原始氨基酸序列的等效电子 - 离子相互作用势(EIIP)表示,实现了蛋白质二级结构类别的预测。使用4个低序列同源性的基准数据集,即PDB25、498、277和204,对所提出的方法进行了评估。此外,随机森林算法结合袋外误差估计和支持向量机(SVM),采用k折交叉验证,证明了我们所报道方法具有很高的特征表示潜力。
使用随机森林分类器时,PDB25、498、277和204数据集的总体预测准确率分别为92.5%、94.79%、92.45%和88.04%,而使用SVM时,结果分别为84.66%、95.32%、89.29%和84.37%。
提出了一种用于预测蛋白质二级结构类别的综合阶次 - 函数 - 频率 - 时间(OFFT)模型。我们首次报道了高斯噪声对蛋白质二级结构类别预测准确率的影响,并提出了一种强大的综合OFFT模型,该模型具有有效的抗噪声能力。