Ibrahim Wisam, Abadeh Mohammad Saniee
Faculty of Electrical and Computer Engineering, Tarbiat Modares University, Tehran, Iran.
Faculty of Electrical and Computer Engineering, Tarbiat Modares University, Tehran, Iran.
J Theor Biol. 2017 May 21;421:1-15. doi: 10.1016/j.jtbi.2017.03.023. Epub 2017 Mar 27.
Protein fold recognition is an important problem in bioinformatics to predict three-dimensional structure of a protein. One of the most challenging tasks in protein fold recognition problem is the extraction of efficient features from the amino-acid sequences to obtain better classifiers. In this paper, we have proposed six descriptors to extract features from protein sequences. These descriptors are applied in the first stage of a three-stage framework PCA-DELM-LDA to extract feature vectors from the amino-acid sequences. Principal Component Analysis PCA has been implemented to reduce the number of extracted features. The extracted feature vectors have been used with original features to improve the performance of the Deep Extreme Learning Machine DELM in the second stage. Four new features have been extracted from the second stage and used in the third stage by Linear Discriminant Analysis LDA to classify the instances into 27 folds. The proposed framework is implemented on the independent and combined feature sets in SCOP datasets. The experimental results show that extracted feature vectors in the first stage could improve the performance of DELM in extracting new useful features in second stage.
蛋白质折叠识别是生物信息学中预测蛋白质三维结构的一个重要问题。蛋白质折叠识别问题中最具挑战性的任务之一是从氨基酸序列中提取有效特征以获得更好的分类器。在本文中,我们提出了六种描述符来从蛋白质序列中提取特征。这些描述符应用于三阶段框架PCA - DELM - LDA的第一阶段,以从氨基酸序列中提取特征向量。主成分分析(PCA)已被用于减少提取特征的数量。提取的特征向量与原始特征一起用于在第二阶段提高深度极限学习机(DELM)的性能。在第二阶段提取了四个新特征,并在第三阶段通过线性判别分析(LDA)将实例分类为27个折叠。所提出的框架在SCOP数据集中的独立和组合特征集上实现。实验结果表明,第一阶段提取的特征向量可以提高DELM在第二阶段提取新的有用特征的性能。