Department of Computer Engineering, University of Science and Technology of Mazandaran, Behshahr, Iran.
Department of Computer Engineering, Faculty of Information Technology, Kermanshah University of Technology, Kermanshah, Iran.
Sci Rep. 2022 Apr 6;12(1):5756. doi: 10.1038/s41598-022-08555-9.
Lysine malonylation is one of the most important post-translational modifications (PTMs). It affects the functionality of cells. Malonylation site prediction in proteins can unfold the mechanisms of cellular functionalities. Experimental methods are one of the due prediction approaches. But they are typically costly and time-consuming to implement. Recently, methods based on machine-learning solutions have been proposed to tackle this problem. Such practices have been shown to reduce costs and time complexities and increase accuracy. However, these approaches also have specific shortcomings, including inappropriate feature extraction out of protein sequences, high-dimensional features, and inefficient underlying classifiers. A machine learning-based method is proposed in this paper to cope with these problems. In the proposed approach, seven different features are extracted. Then, the extracted features are combined, ranked based on the Fisher's score (F-score), and the most efficient ones are selected. Afterward, malonylation sites are predicted using various classifiers. Simulation results show that the proposed method has acceptable performance compared with some state-of-the-art approaches. In addition, the XGBOOST classifier, founded on extracted features such as TFCRF, has a higher prediction rate than the other methods. The codes are publicly available at: https://github.com/jimy2020/Malonylation-site-prediction.
赖氨酸丙二酰化是最重要的翻译后修饰(PTMs)之一。它影响细胞的功能。蛋白质中丙二酰化位点的预测可以揭示细胞功能的机制。实验方法是一种主要的预测方法。但它们通常实施成本高、耗时。最近,提出了基于机器学习解决方案的方法来解决这个问题。这些方法已经被证明可以降低成本和时间复杂度,并提高准确性。然而,这些方法也有特定的缺点,包括从蛋白质序列中提取不合适的特征、高维特征和低效的基础分类器。本文提出了一种基于机器学习的方法来解决这些问题。在提出的方法中,提取了七种不同的特征。然后,将提取的特征进行组合,根据 Fisher 得分(F 得分)进行排序,并选择最有效的特征。然后,使用各种分类器预测丙二酰化位点。模拟结果表明,与一些最先进的方法相比,该方法具有可接受的性能。此外,基于 TFCRF 等提取特征的 XGBOOST 分类器的预测率高于其他方法。代码可在:https://github.com/jimy2020/Malonylation-site-prediction 上获得。