Gutierrez Clair S, Kassim Alia A, Gutierrez Benjamin D, Raines Ronald T
bioRxiv. 2024 Jun 4:2024.06.03.596298. doi: 10.1101/2024.06.03.596298.
Post-translational modifications (PTMs) increase the diversity of the proteome and are vital to organismal life and therapeutic strategies. Deep learning has been used to predict PTM locations. Still, limitations in datasets and their analyses compromise success. Here we evaluate the use of known PTM sites in prediction via sequence-based deep learning algorithms. Specifically, PTM locations were encoded as a separate amino acid before sequences were encoded via word embedding and passed into a convolutional neural network that predicts the probability of a modification at a given site. Without labeling known PTMs, our model is on par with others. With labeling, however, we improved significantly upon extant models. Moreover, knowing PTM locations can increase the predictability of a different PTM. Our findings highlight the importance of PTMs for the installation of additional PTMs. We anticipate that including known PTM locations will enhance the performance of other proteomic machine learning algorithms.
翻译后修饰(PTMs)增加了蛋白质组的多样性,对生物体生命和治疗策略至关重要。深度学习已被用于预测翻译后修饰位点。然而,数据集及其分析的局限性影响了预测的成功率。在这里,我们评估了通过基于序列的深度学习算法,利用已知的翻译后修饰位点进行预测的方法。具体来说,在通过词嵌入对序列进行编码并将其输入到预测给定位点修饰概率的卷积神经网络之前,将翻译后修饰位点编码为一个单独的氨基酸。在不标记已知翻译后修饰位点的情况下,我们的模型与其他模型相当。然而,在进行标记后,我们在现有模型的基础上有了显著改进。此外,了解翻译后修饰位点可以提高对另一种翻译后修饰的预测能力。我们的研究结果突出了翻译后修饰对于额外翻译后修饰的重要性。我们预计,纳入已知的翻译后修饰位点将提高其他蛋白质组学机器学习算法的性能。