基于多数类样本选择和合成少数类过采样技术的支持向量机预测赖氨酸甲酰化位点。

Prediction of lysine formylation sites using support vector machine based on the sample selection from majority classes and synthetic minority over-sampling techniques.

机构信息

Dept. of Computer Science and Engineering, Rajshahi University of Engineering and Technology, Rajshahi, Bangladesh; Dept. of Computer Science and Engineering, Hajee Mohammad Danesh Science and Technology University, Dinajpur, Bangladesh.

Dept. of Computer Science and Engineering, Rajshahi University of Engineering and Technology, Rajshahi, Bangladesh.

出版信息

Biochimie. 2022 Jan;192:125-135. doi: 10.1016/j.biochi.2021.10.001. Epub 2021 Oct 7.

DOI:10.1016/j.biochi.2021.10.001

PMID:34627982

Abstract

Lysine formylation is a newly discovered and mostly interested type of post-translational modification (PTM) that is generally found on core and linker histone proteins of prokaryote and eukaryote and plays various important roles on the regulation of various cellular mechanisms. Hence, it is very urgent to properly identify formylation site in protein for understanding the molecular mechanism of formylation deeply and defining drug for relevant diseases. As experimentally identification of formylation site using traditional processes are expensive and time consuming, a simple and high speedy mathematical model for predicting accurately lysine formylation sites is highly desired. A useful computational model named PLF_SVM is deigned and proposed in this study by using binary encoding (BE), amino acid composition (AAC), reverse position relative incidence matrix (RPRIM), position relative incidence matrix (PRIM), and position specific amino acid propensity (PSAAP) feature generation methods for predicting formylated and non-formylated lysine sites. Besides, the Synthetic Minority Oversampling Technique (SMOTE) and a proposed sample selection strategy named EnSVM are applied to handle the imbalance training dataset problem. Thereafter, the optimal number of features are selected by F-score method to train the model. Finally, it has been seen that PLF_SVM outperforms the state-of-the-art approaches in validation and independent test with an accuracy of 98.61% and 98.77% respectively. At https://plf-svm.herokuapp.com/, a user-friendly web tool is also created for identifying formylation sites. Therefore, the proposed method may be helpful guideline for the analysis and prediction of formylated lysine and knowing the process of cellular regulation.

摘要

赖氨酸甲酰化是一种新发现的、备受关注的翻译后修饰（PTM）类型，通常存在于原核生物和真核生物的核心和连接组蛋白中，在调节各种细胞机制方面发挥着各种重要作用。因此，正确识别蛋白质中的甲酰化位点对于深入了解甲酰化的分子机制和定义相关疾病的药物非常紧迫。由于使用传统方法实验鉴定甲酰化位点既昂贵又耗时，因此非常需要设计和提出一种简单、快速的数学模型来准确预测赖氨酸甲酰化位点。在这项研究中，我们设计并提出了一种名为 PLF_SVM 的有用计算模型，该模型使用二进制编码（BE）、氨基酸组成（AAC）、反向位置相对发生率矩阵（RPRIM）、位置相对发生率矩阵（PRIM）和位置特异性氨基酸倾向（PSAAP）特征生成方法来预测甲酰化和非甲酰化赖氨酸位点。此外，还应用了合成少数过采样技术（SMOTE）和一种名为 EnSVM 的建议样本选择策略来处理不平衡训练数据集问题。然后，通过 F 分数法选择最佳特征数来训练模型。最后，PLF_SVM 在验证和独立测试中都表现优于最新方法，准确率分别为 98.61%和 98.77%。在 https://plf-svm.herokuapp.com/，我们还创建了一个用户友好的网络工具，用于识别甲酰化位点。因此，该方法可能有助于分析和预测甲酰化赖氨酸，并了解细胞调节过程。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

基于多数类样本选择和合成少数类过采样技术的支持向量机预测赖氨酸甲酰化位点。

Prediction of lysine formylation sites using support vector machine based on the sample selection from majority classes and synthetic minority over-sampling techniques.

机构信息

出版信息

相似文献

引用本文的文献

基于多数类样本选择和合成少数类过采样技术的支持向量机预测赖氨酸甲酰化位点。

Prediction of lysine formylation sites using support vector machine based on the sample selection from majority classes and synthetic minority over-sampling techniques.

机构信息

出版信息

相似文献

引用本文的文献