School of Basic Medicine, Qingdao University, Qingdao 266021, China.
School of Data Science and Software Engineering, Qingdao University, Qingdao 266021, China.
Genomics Proteomics Bioinformatics. 2018 Dec;16(6):451-459. doi: 10.1016/j.gpb.2018.08.004. Epub 2019 Jan 11.
As a newly-identified protein post-translational modification, malonylation is involved in a variety of biological functions. Recognizing malonylation sites in substrates represents an initial but crucial step in elucidating the molecular mechanisms underlying protein malonylation. In this study, we constructed a deep learning (DL) network classifier based on long short-term memory (LSTM) with word embedding (LSTM) for the prediction of mammalian malonylation sites. LSTM performs better than traditional classifiers developed with common pre-defined feature encodings or a DL classifier based on LSTM with a one-hot vector. The performance of LSTM is sensitive to the size of the training set, but this limitation can be overcome by integration with a traditional machine learning (ML) classifier. Accordingly, an integrated approach called LEMP was developed, which includes LSTM and the random forest classifier with a novel encoding of enhanced amino acid content. LEMP performs not only better than the individual classifiers but also superior to the currently-available malonylation predictors. Additionally, it demonstrates a promising performance with a low false positive rate, which is highly useful in the prediction application. Overall, LEMP is a useful tool for easily identifying malonylation sites with high confidence. LEMP is available at http://www.bioinfogo.org/lemp.
作为一种新鉴定的蛋白质翻译后修饰,丙二酰化参与多种生物学功能。鉴定底物中的丙二酰化位点是阐明蛋白质丙二酰化分子机制的初始但关键的步骤。在这项研究中,我们构建了一个基于长短期记忆(LSTM)和词嵌入(LSTM)的深度学习(DL)网络分类器,用于预测哺乳动物丙二酰化位点。LSTM 的性能优于使用常见预定义特征编码或基于 LSTM 的 DL 分类器的传统分类器。LSTM 的性能对训练集的大小敏感,但可以通过与传统机器学习(ML)分类器集成来克服这一限制。因此,开发了一种名为 LEMP 的集成方法,它包括 LSTM 和随机森林分类器,以及一种新的增强氨基酸含量编码。LEMP 的性能不仅优于单个分类器,而且优于现有的丙二酰化预测器。此外,它还具有较低的假阳性率,在预测应用中非常有用。总的来说,LEMP 是一种用于轻松识别丙二酰化位点的有用工具,具有较高的置信度。LEMP 可在 http://www.bioinfogo.org/lemp 上获得。