Dehzangi Abdollah, Paliwal Kuldip, Sharma Alok, Dehzangi Omid, Sattar Abdul
Griffith University, and National ICT Australia (NICTA), Brisbane.
IEEE/ACM Trans Comput Biol Bioinform. 2013 May-Jun;10(3):564-75. doi: 10.1109/TCBB.2013.65.
Better understanding of structural class of a given protein reveals important information about its overall folding type and its domain. It can also be directly used to provide critical information on general tertiary structure of a protein which has a profound impact on protein function determination and drug design. Despite tremendous enhancements made by pattern recognition-based approaches to solve this problem, it still remains as an unsolved issue for bioinformatics that demands more attention and exploration. In this study, we propose a novel feature extraction model that incorporates physicochemical and evolutionary-based information simultaneously. We also propose overlapped segmented distribution and autocorrelation-based feature extraction methods to provide more local and global discriminatory information. The proposed feature extraction methods are explored for 15 most promising attributes that are selected from a wide range of physicochemical-based attributes. Finally, by applying an ensemble of different classifiers namely, Adaboost.M1, LogitBoost, naive Bayes, multilayer perceptron (MLP), and support vector machine (SVM) we show enhancement of the protein structural class prediction accuracy for four popular benchmarks.
更好地理解给定蛋白质的结构类别可以揭示有关其整体折叠类型及其结构域的重要信息。它还可以直接用于提供有关蛋白质一般三级结构的关键信息,这对蛋白质功能测定和药物设计具有深远影响。尽管基于模式识别的方法在解决这个问题上取得了巨大进展,但它仍然是生物信息学中一个未解决的问题,需要更多的关注和探索。在本研究中,我们提出了一种新颖的特征提取模型,该模型同时整合了基于物理化学和进化的信息。我们还提出了基于重叠分段分布和自相关的特征提取方法,以提供更多的局部和全局判别信息。从广泛的基于物理化学的属性中选择了15个最有前景的属性,对所提出的特征提取方法进行了探索。最后,通过应用不同分类器的集成,即Adaboost.M1、LogitBoost、朴素贝叶斯、多层感知器(MLP)和支持向量机(SVM),我们展示了在四个流行基准上蛋白质结构类别预测准确性的提高。