Zhang Lichao, Kong Liang, Han Xiaodong, Lv Jinfeng
School of Mathematics and Statistics, Northeastern University at Qinhuangdao, Qinhuangdao 066004, PR China.
School of Mathematics and Information Science & Technology, Hebei Normal University of Science & Technology, Qinhuangdao 066004, PR China.
J Theor Biol. 2016 Jul 7;400:1-10. doi: 10.1016/j.jtbi.2016.04.011. Epub 2016 Apr 12.
Protein structural class prediction plays an important role in protein structure and function analysis, drug design and many other biological applications. Extracting good representation from protein sequence is fundamental for this prediction task. In recent years, although several secondary structure based feature extraction strategies have been specially proposed for low-similarity protein sequences, the prediction accuracy still remains limited. To explore the potential of secondary structure information, this study proposed a novel feature extraction method from the chaos game representation of predicted secondary structure to mainly capture sequence order information and secondary structure segments distribution information in a given protein sequence. Several kinds of prediction accuracies obtained by the jackknife test are reported on three widely used low-similarity benchmark datasets (25PDB, 1189 and 640). Compared with the state-of-the-art prediction methods, the proposed method achieves the highest overall accuracies on all the three datasets. The experimental results confirm that the proposed feature extraction method is effective for accurate prediction of protein structural class. Moreover, it is anticipated that the proposed method could be extended to other graphical representations of protein sequence and be helpful in future research.
蛋白质结构类别预测在蛋白质结构与功能分析、药物设计及许多其他生物学应用中发挥着重要作用。从蛋白质序列中提取良好的表征是该预测任务的基础。近年来,尽管已经专门针对低相似性蛋白质序列提出了几种基于二级结构的特征提取策略,但预测准确性仍然有限。为了探索二级结构信息的潜力,本研究提出了一种从预测二级结构的混沌博弈表示中提取特征的新方法,以主要捕获给定蛋白质序列中的序列顺序信息和二级结构片段分布信息。在三个广泛使用的低相似性基准数据集(25PDB、1189和640)上报告了通过留一法测试获得的几种预测准确率。与当前最先进的预测方法相比,该方法在所有三个数据集上均取得了最高的总体准确率。实验结果证实,所提出的特征提取方法对于准确预测蛋白质结构类别是有效的。此外,预计该方法可扩展到蛋白质序列的其他图形表示,并有助于未来的研究。