Department of Electrical Engineering, Amirkabir University of Technology, Tehran, Iran.
Iranian Research Institute for Information Science and Technology (IranDoc), Tehran, Iran.
Sci Rep. 2020 Sep 1;10(1):14368. doi: 10.1038/s41598-020-71172-x.
Protein fold recognition plays a crucial role in discovering three-dimensional structure of proteins and protein functions. Several approaches have been employed for the prediction of protein folds. Some of these approaches are based on extracting features from protein sequences and using a strong classifier. Feature extraction techniques generally utilize syntactical-based information, evolutionary-based information and physicochemical-based information to extract features. In recent years, finding an efficient technique for integrating discriminate features have been received advancing attention. In this study, we integrate Auto-Cross-Covariance and Separated dimer evolutionary feature extraction methods. The results' features are scored by Information gain to define and select several discriminated features. According to three benchmark datasets, DD, RDD ,and EDD, the results of the support vector machine show more than 6[Formula: see text] improvement in accuracy on these benchmark datasets.
蛋白质折叠识别在发现蛋白质的三维结构和蛋白质功能方面起着至关重要的作用。已经采用了几种方法来预测蛋白质折叠。其中一些方法基于从蛋白质序列中提取特征,并使用强分类器。特征提取技术通常利用基于语法的信息、基于进化的信息和基于物理化学的信息来提取特征。近年来,寻找一种有效的技术来集成判别特征已引起人们的关注。在这项研究中,我们整合了自交叉协方差和分离二聚体进化特征提取方法。使用信息增益对结果的特征进行评分,以定义和选择几个有区别的特征。根据三个基准数据集 DD、RDD 和 EDD,支持向量机的结果在这些基准数据集上的准确率提高了 6[Formula: see text]以上。