Malebary Sharaf, Rahman Shaista, Barukab Omar, Ash'ari Rehab, Khan Sher Afzal
Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah 21911, Saudi Arabia.
Department of Computer Science, Abdul Wali Khan University Mardan, Mardan 23200, Pakistan.
Membranes (Basel). 2022 Feb 25;12(3):265. doi: 10.3390/membranes12030265.
Acetylation is the most important post-translation modification (PTM) in eukaryotes; it has manifold effects on the level of protein that transform an acetyl group from an acetyl coenzyme to a specific site on a polypeptide chain. Acetylation sites play many important roles, including regulating membrane protein functions and strongly affecting the membrane interaction of proteins and membrane remodeling. Because of these properties, its correct identification is essential to understand its mechanism in biological systems. As such, some traditional methods, such as mass spectrometry and site-directed mutagenesis, are used, but they are tedious and time-consuming. To overcome such limitations, many computer models are being developed to correctly identify their sequences from non-acetyl sequences, but they have poor efficiency in terms of accuracy, sensitivity, and specificity. This work proposes an efficient and accurate computational model for predicting Acetylation using machine learning approaches. The proposed model achieved an accuracy of 100 percent with the 10-fold cross-validation test based on the Random Forest classifier, along with a feature extraction approach using statistical moments. The model is also validated by the jackknife, self-consistency, and independent test, which achieved an accuracy of 100, 100, and 97, respectively, results far better as compared to the already existing models available in the literature.
乙酰化是真核生物中最重要的翻译后修饰(PTM);它对蛋白质水平具有多种影响,即将乙酰基团从乙酰辅酶转移到多肽链上的特定位点。乙酰化位点发挥着许多重要作用,包括调节膜蛋白功能以及强烈影响蛋白质与膜的相互作用和膜重塑。由于这些特性,正确识别它对于理解其在生物系统中的机制至关重要。因此,人们使用了一些传统方法,如质谱分析和定点诱变,但这些方法既繁琐又耗时。为了克服这些局限性,人们正在开发许多计算机模型来从非乙酰化序列中正确识别它们的序列,但这些模型在准确性、敏感性和特异性方面效率较低。这项工作提出了一种使用机器学习方法预测乙酰化的高效且准确的计算模型。所提出的模型基于随机森林分类器以及使用统计矩的特征提取方法,在10折交叉验证测试中达到了100%的准确率。该模型还通过留一法、自一致性和独立测试进行了验证,其准确率分别达到了100%、100%和97%,与文献中现有的模型相比,结果要好得多。