iAcety-SmRF：利用统计矩和随机森林鉴定乙酰化蛋白

iAcety-SmRF: Identification of Acetylation Protein by Using Statistical Moments and Random Forest.

作者信息

Malebary Sharaf, Rahman Shaista, Barukab Omar, Ash'ari Rehab, Khan Sher Afzal

机构信息

Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah 21911, Saudi Arabia.

Department of Computer Science, Abdul Wali Khan University Mardan, Mardan 23200, Pakistan.

出版信息

Membranes (Basel). 2022 Feb 25;12(3):265. doi: 10.3390/membranes12030265.

DOI:10.3390/membranes12030265

PMID:35323738

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8955084/

Abstract

Acetylation is the most important post-translation modification (PTM) in eukaryotes; it has manifold effects on the level of protein that transform an acetyl group from an acetyl coenzyme to a specific site on a polypeptide chain. Acetylation sites play many important roles, including regulating membrane protein functions and strongly affecting the membrane interaction of proteins and membrane remodeling. Because of these properties, its correct identification is essential to understand its mechanism in biological systems. As such, some traditional methods, such as mass spectrometry and site-directed mutagenesis, are used, but they are tedious and time-consuming. To overcome such limitations, many computer models are being developed to correctly identify their sequences from non-acetyl sequences, but they have poor efficiency in terms of accuracy, sensitivity, and specificity. This work proposes an efficient and accurate computational model for predicting Acetylation using machine learning approaches. The proposed model achieved an accuracy of 100 percent with the 10-fold cross-validation test based on the Random Forest classifier, along with a feature extraction approach using statistical moments. The model is also validated by the jackknife, self-consistency, and independent test, which achieved an accuracy of 100, 100, and 97, respectively, results far better as compared to the already existing models available in the literature.

摘要

乙酰化是真核生物中最重要的翻译后修饰（PTM）；它对蛋白质水平具有多种影响，即将乙酰基团从乙酰辅酶转移到多肽链上的特定位点。乙酰化位点发挥着许多重要作用，包括调节膜蛋白功能以及强烈影响蛋白质与膜的相互作用和膜重塑。由于这些特性，正确识别它对于理解其在生物系统中的机制至关重要。因此，人们使用了一些传统方法，如质谱分析和定点诱变，但这些方法既繁琐又耗时。为了克服这些局限性，人们正在开发许多计算机模型来从非乙酰化序列中正确识别它们的序列，但这些模型在准确性、敏感性和特异性方面效率较低。这项工作提出了一种使用机器学习方法预测乙酰化的高效且准确的计算模型。所提出的模型基于随机森林分类器以及使用统计矩的特征提取方法，在10折交叉验证测试中达到了100%的准确率。该模型还通过留一法、自一致性和独立测试进行了验证，其准确率分别达到了100%、100%和97%，与文献中现有的模型相比，结果要好得多。

相似文献

iAcety-SmRF: Identification of Acetylation Protein by Using Statistical Moments and Random Forest.iAcety-SmRF：利用统计矩和随机森林鉴定乙酰化蛋白

Membranes (Basel). 2022 Feb 25;12(3):265. doi: 10.3390/membranes12030265.

iSulfoTyr-PseAAC: Identify Tyrosine Sulfation Sites by Incorporating Statistical Moments Chou's 5-steps Rule and Pseudo Components.iSulfoTyr-PseAAC：通过结合统计矩、周氏五步法则和伪组分来识别酪氨酸硫酸化位点

Curr Genomics. 2019 May;20(4):306-320. doi: 10.2174/1389202920666190819091609.

Identifying Acetylation Protein by Fusing Its PseAAC and Functional Domain Annotation.通过融合乙酰化蛋白的伪氨基酸组成和功能域注释来识别该蛋白

Front Bioeng Biotechnol. 2019 Dec 6;7:311. doi: 10.3389/fbioe.2019.00311. eCollection 2019.

DeepDBS: Identification of DNA-binding sites in protein sequences by using deep representations and random forest.DeepDBS：利用深度表示和随机森林识别蛋白质序列中的 DNA 结合位点。

Methods. 2024 Nov;231:26-36. doi: 10.1016/j.ymeth.2024.09.004. Epub 2024 Sep 11.

DHU-Pred: accurate prediction of dihydrouridine sites using position and composition variant features on diverse classifiers.DHU-Pred：使用多种分类器上的位置和组成变体特征准确预测二氢尿嘧啶位点。

PeerJ. 2022 Oct 27;10:e14104. doi: 10.7717/peerj.14104. eCollection 2022.

Prediction of Protein Acetylation Sites using Kernel Naive Bayes Classifier Based on Protein Sequences Profiling.基于蛋白质序列特征的核朴素贝叶斯分类器预测蛋白质乙酰化位点

Bioinformation. 2018 May 31;14(5):213-218. doi: 10.6026/97320630014213. eCollection 2018.

4mC-RF: Improving the prediction of 4mC sites using composition and position relative features and statistical moment.4mC-RF：利用组成和位置相关特征及统计矩改进4mC位点预测

Anal Biochem. 2021 Nov 15;633:114385. doi: 10.1016/j.ab.2021.114385. Epub 2021 Sep 25.

RF-MaloSite and DL-Malosite: Methods based on random forest and deep learning to identify malonylation sites.RF-MaloSite和DL-Malosite：基于随机森林和深度学习识别丙二酰化位点的方法。

Comput Struct Biotechnol J. 2020 Mar 4;18:852-860. doi: 10.1016/j.csbj.2020.02.012. eCollection 2020.

iPhosT-PseAAC: Identify phosphothreonine sites by incorporating sequence statistical moments into PseAAC.iPhosT-PseAAC：通过将序列统计矩纳入伪氨基酸组成来识别磷酸苏氨酸位点。

Anal Biochem. 2018 Jun 1;550:109-116. doi: 10.1016/j.ab.2018.04.021. Epub 2018 Apr 25.

Prediction of Citrullination Sites on the Basis of mRMR Method and SNN.基于mRMR方法和SNN的瓜氨酸化位点预测

Comb Chem High Throughput Screen. 2019;22(10):705-715. doi: 10.2174/1386207322666191129113508.

引用本文的文献

RMTLysPTM: recognizing multiple types of lysine PTM sites by deep analysis on sequences.RMTLysPTM：通过对序列进行深度分析来识别多种类型的赖氨酸翻译后修饰位点

Brief Bioinform. 2023 Nov 22;25(1). doi: 10.1093/bib/bbad450.

Functions and mechanisms of protein lysine butyrylation (Kbu): Therapeutic implications in human diseases.蛋白质赖氨酸丁酰化（Kbu）的功能与机制：对人类疾病的治疗意义

Genes Dis. 2022 Nov 29;10(6):2479-2490. doi: 10.1016/j.gendis.2022.10.025. eCollection 2023 Nov.

IUP-BERT: Identification of Umami Peptides Based on BERT Features.IUP-BERT：基于BERT特征的鲜味肽识别

Foods. 2022 Nov 21;11(22):3742. doi: 10.3390/foods11223742.

Identify Bitter Peptides by Using Deep Representation Learning Features.利用深度表示学习特征识别苦味肽。

Int J Mol Sci. 2022 Jul 17;23(14):7877. doi: 10.3390/ijms23147877.

本文引用的文献

Lysine acetylation regulates the interaction between proteins and membranes.赖氨酸乙酰化调节蛋白质与膜的相互作用。

Nat Commun. 2021 Nov 9;12(1):6466. doi: 10.1038/s41467-021-26657-2.

α-Synuclein and neuronal membranes: Conformational flexibilities in health and disease.α-突触核蛋白与神经元膜：健康与疾病中的构象柔韧性。

Chem Phys Lipids. 2021 Mar;235:105034. doi: 10.1016/j.chemphyslip.2020.105034. Epub 2021 Jan 9.

Identifying Acetylation Protein by Fusing Its PseAAC and Functional Domain Annotation.通过融合乙酰化蛋白的伪氨基酸组成和功能域注释来识别该蛋白

Front Bioeng Biotechnol. 2019 Dec 6;7:311. doi: 10.3389/fbioe.2019.00311. eCollection 2019.

iPhosH-PseAAC: Identify Phosphohistidine Sites in Proteins by Blending Statistical Moments and Position Relative Features According to the Chou's 5-Step Rule and General Pseudo Amino Acid Composition.iPhosH-PseAAC：根据周的五步法则和广义伪氨基酸组成，通过融合统计矩和位置相对特征来识别蛋白质中的磷酸组氨酸位点。

IEEE/ACM Trans Comput Biol Bioinform. 2021 Mar-Apr;18(2):596-610. doi: 10.1109/TCBB.2019.2919025. Epub 2021 Apr 6.

Prediction of antioxidant proteins by incorporating statistical moments based features into Chou's PseAAC.基于统计矩特征的 Chou's PseAAC 算法预测抗氧化蛋白

J Theor Biol. 2019 Jul 21;473:1-8. doi: 10.1016/j.jtbi.2019.04.019. Epub 2019 Apr 18.

SPrenylC-PseAAC: A sequence-based model developed via Chou's 5-steps rule and general PseAAC for identifying S-prenylation sites in proteins.SPrenylC-PseAAC：一种基于序列的模型，通过 Chou 的 5 步规则和广义 PseAAC 开发，用于识别蛋白质中的 S- prenylation 位点。

J Theor Biol. 2019 May 7;468:1-11. doi: 10.1016/j.jtbi.2019.02.007. Epub 2019 Feb 12.

pSSbond-PseAAC: Prediction of disulfide bonding sites by integration of PseAAC and statistical moments.pSSbond-PseAAC：通过 PseAAC 和统计矩的集成预测二硫键结合位点。

J Theor Biol. 2019 Feb 21;463:47-55. doi: 10.1016/j.jtbi.2018.12.015. Epub 2018 Dec 12.

iPhosY-PseAAC: identify phosphotyrosine sites by incorporating sequence statistical moments into PseAAC.iPhosY-PseAAC：通过将序列统计矩纳入伪氨基酸组成来识别磷酸酪氨酸位点。

Mol Biol Rep. 2018 Dec;45(6):2501-2509. doi: 10.1007/s11033-018-4417-z. Epub 2018 Oct 11.

Predicting membrane proteins and their types by extracting various sequence features into Chou's general PseAAC.通过将各种序列特征提取到周氏广义伪氨基酸组成中预测膜蛋白及其类型。

Mol Biol Rep. 2018 Dec;45(6):2295-2306. doi: 10.1007/s11033-018-4391-5. Epub 2018 Sep 20.

ProAcePred: prokaryote lysine acetylation sites prediction based on elastic net feature optimization.ProAcePred：基于弹性网络特征优化的原核赖氨酸乙酰化位点预测。

Bioinformatics. 2018 Dec 1;34(23):3999-4006. doi: 10.1093/bioinformatics/bty444.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

iAcety-SmRF：利用统计矩和随机森林鉴定乙酰化蛋白

iAcety-SmRF: Identification of Acetylation Protein by Using Statistical Moments and Random Forest.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献