Suppr超能文献

一种用于高效预测琥珀酰化位点的混合特征提取方案。

A hybrid feature extraction scheme for efficient malonylation site prediction.

机构信息

Department of Computer Engineering, University of Science and Technology of Mazandaran, Behshahr, Iran.

Department of Computer Engineering, Faculty of Information Technology, Kermanshah University of Technology, Kermanshah, Iran.

出版信息

Sci Rep. 2022 Apr 6;12(1):5756. doi: 10.1038/s41598-022-08555-9.

Abstract

Lysine malonylation is one of the most important post-translational modifications (PTMs). It affects the functionality of cells. Malonylation site prediction in proteins can unfold the mechanisms of cellular functionalities. Experimental methods are one of the due prediction approaches. But they are typically costly and time-consuming to implement. Recently, methods based on machine-learning solutions have been proposed to tackle this problem. Such practices have been shown to reduce costs and time complexities and increase accuracy. However, these approaches also have specific shortcomings, including inappropriate feature extraction out of protein sequences, high-dimensional features, and inefficient underlying classifiers. A machine learning-based method is proposed in this paper to cope with these problems. In the proposed approach, seven different features are extracted. Then, the extracted features are combined, ranked based on the Fisher's score (F-score), and the most efficient ones are selected. Afterward, malonylation sites are predicted using various classifiers. Simulation results show that the proposed method has acceptable performance compared with some state-of-the-art approaches. In addition, the XGBOOST classifier, founded on extracted features such as TFCRF, has a higher prediction rate than the other methods. The codes are publicly available at: https://github.com/jimy2020/Malonylation-site-prediction.

摘要

赖氨酸丙二酰化是最重要的翻译后修饰(PTMs)之一。它影响细胞的功能。蛋白质中丙二酰化位点的预测可以揭示细胞功能的机制。实验方法是一种主要的预测方法。但它们通常实施成本高、耗时。最近,提出了基于机器学习解决方案的方法来解决这个问题。这些方法已经被证明可以降低成本和时间复杂度,并提高准确性。然而,这些方法也有特定的缺点,包括从蛋白质序列中提取不合适的特征、高维特征和低效的基础分类器。本文提出了一种基于机器学习的方法来解决这些问题。在提出的方法中,提取了七种不同的特征。然后,将提取的特征进行组合,根据 Fisher 得分(F 得分)进行排序,并选择最有效的特征。然后,使用各种分类器预测丙二酰化位点。模拟结果表明,与一些最先进的方法相比,该方法具有可接受的性能。此外,基于 TFCRF 等提取特征的 XGBOOST 分类器的预测率高于其他方法。代码可在:https://github.com/jimy2020/Malonylation-site-prediction 上获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f5b/8987080/5e0d14932b13/41598_2022_8555_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验