Charih François, Boulter Mullen, Biggar Kyle K, Green James R
Department of Systems and Computer Engineering, Carleton University, Ottawa, ON, Canada.
Institute of Biochemistry, Department of Biology, Carleton University, Ottawa, ON, Canada.
bioRxiv. 2025 Sep 1:2025.08.27.672583. doi: 10.1101/2025.08.27.672583.
Lysine methylation is a dynamic and reversible post-translational modification of proteins carried out by lysine methyltransferase enzymes. The role of this modification in epigenetics and gene regulation is relatively well understood, but our understanding of the extent and the role of lysine methylation of non-histone substrates remains fairly limited. Several lysine methyltransferases which methylate non-histone substrates are overexpressed in a number of cancers and are believed to be key drivers of cancer progression. There is great incentive to identify the lysine methylome, as this is a key step in identifying drug targets. While numerous computational models have been developed in the last decade to identify novel lysine methylation sites, the accuracy of these model has been modest, leaving much room for improvement. In this work, we leverage the most recent advancements in deep learning and present a transformer-based model for lysine methylation site prediction which achieves state-of-the-art accuracy. In addition, we show that other post-translational modifications of lysine are informative and that multitask learning is an effective way to integrate this prior knowledge into our lysine methylation site predictor, MethylSight 2.0. Finally, we validate our model by means of mass spectrometry experiments and identify 68 novel lysine methylation sites. This work constitutes another contribution towards the completion of a comprehensive map of the lysine methylome.
赖氨酸甲基化是一种由赖氨酸甲基转移酶催化的蛋白质动态可逆的翻译后修饰。这种修饰在表观遗传学和基因调控中的作用已得到较好理解,但我们对非组蛋白底物赖氨酸甲基化的程度和作用的了解仍然相当有限。几种使非组蛋白底物甲基化的赖氨酸甲基转移酶在多种癌症中过表达,被认为是癌症进展的关键驱动因素。识别赖氨酸甲基化组有很大的动机,因为这是识别药物靶点的关键步骤。尽管在过去十年中已经开发了许多计算模型来识别新的赖氨酸甲基化位点,但这些模型的准确性一般,仍有很大的改进空间。在这项工作中,我们利用深度学习的最新进展,提出了一种基于Transformer的赖氨酸甲基化位点预测模型,该模型达到了目前的最高准确率。此外,我们表明赖氨酸的其他翻译后修饰也具有参考价值,多任务学习是将这些先验知识整合到我们的赖氨酸甲基化位点预测器MethylSight 2.0中的有效方法。最后,我们通过质谱实验验证了我们的模型,并识别出68个新的赖氨酸甲基化位点。这项工作为完成赖氨酸甲基化组的全面图谱做出了又一贡献。