使用支持向量机和融合多个 F-Score 特征选择的方法预测蛋白质中的赖氨酸磷酸化糖基化位点

PLP_FS: prediction of lysine phosphoglycerylation sites in protein using support vector machine and fusion of multiple F_Score feature selection.

机构信息

Dept. of Computer Science and Engineering, Rajshahi University of Engineering and Technology, Rajshahi, Bangladesh.

Dept. of Computer Science and Engineering, Hajee Mohammad Danesh Science and Technology University, Dinajpur, Bangladesh.

出版信息

Brief Bioinform. 2022 Sep 20;23(5). doi: 10.1093/bib/bbac306.

DOI:10.1093/bib/bbac306

PMID:35929355

Abstract

A newly invented post-translational modification (PTM), phosphoglycerylation, has shown its essential role in the construction and functional properties of proteins and dangerous human diseases. Hence, it is very urgent to know about the molecular mechanism behind the phosphoglycerylation process to develop the drugs for related diseases. But accurately identifying of phosphoglycerylation site from a protein sequence in a laboratory is a very difficult and challenging task. Hence, the construction of an efficient computation model is greatly sought for this purpose. A little number of computational models are currently available for identifying the phosphoglycerylation sites, which are not able to reach their prediction capability at a satisfactory level. Therefore, an effective predictor named PLP_FS has been designed and constructed to identify phosphoglycerylation sites in this study. For the training purpose, an optimal number of feature sets was obtained by fusion of multiple F_Score feature selection techniques from the features generated by three types of sequence-based feature extraction methods and fitted with the support vector machine classification technique to the prediction model. On the other hand, the k-neighbor near cleaning and SMOTE methods were also implemented to balance the benchmark dataset. The suggested model in 10-fold cross-validation obtained an accuracy of 99.22%, a sensitivity of 98.17% and a specificity of 99.75% according to the experimental findings, which are better than other currently available predictors for accurately identifying the phosphoglycerylation sites.

摘要

一种新发明的翻译后修饰（PTM），磷酸化，已经显示出它在蛋白质的构建和功能特性以及危险的人类疾病中的重要作用。因此，了解磷酸化过程背后的分子机制对于开发相关疾病的药物非常紧迫。但是，在实验室中从蛋白质序列中准确识别磷酸化位点是一项非常困难和具有挑战性的任务。因此，非常需要构建一个有效的计算模型来实现这一目的。目前可用于识别磷酸化位点的计算模型数量很少，无法达到令人满意的预测能力水平。因此，本研究设计并构建了一个名为 PLP_FS 的有效预测器，用于识别磷酸化位点。为了训练目的，通过融合来自三种基于序列的特征提取方法生成的特征的多个 F_Score 特征选择技术，获得了最佳数量的特征集，并将其拟合到支持向量机分类技术的预测模型中。另一方面，还实施了 k-近邻近清理和 SMOTE 方法来平衡基准数据集。根据实验结果，该模型在 10 折交叉验证中获得了 99.22%的准确率、98.17%的灵敏度和 99.75%的特异性，优于其他目前可用的预测器，可更准确地识别磷酸化位点。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

使用支持向量机和融合多个 F-Score 特征选择的方法预测蛋白质中的赖氨酸磷酸化糖基化位点

PLP_FS: prediction of lysine phosphoglycerylation sites in protein using support vector machine and fusion of multiple F_Score feature selection.

机构信息

出版信息

相似文献

引用本文的文献

使用支持向量机和融合多个 F-Score 特征选择的方法预测蛋白质中的赖氨酸磷酸化糖基化位点

PLP_FS: prediction of lysine phosphoglycerylation sites in protein using support vector machine and fusion of multiple F_Score feature selection.

机构信息

出版信息

相似文献

引用本文的文献