Lai Hongyan, Luo Diyu, Yang Mi, Zhu Tao, Yang Huan, Luo Xinwei, Wei Yijie, Xie Sijia, Hong Feitong, Shu Kunxian, Dao Fuying, Ding Hui
Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China.
Clinical Hospital of Chengdu Brain Science Institute, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 610054, China.
BMC Biol. 2025 Apr 7;23(1):95. doi: 10.1186/s12915-025-02202-1.
Lactylation is a newly discovered type of post-translational modification, primarily occurring on lysine (K) residues of both histones and non-histones to exert diverse effects on target proteins. Research has shown that lysine lactylation (Kla) modification is ubiquitous in different cells and participates in the determination of cell function and fate, as well as in the initiation and progression of various diseases. Precise identification of Kla sites is fundamental for elucidating their biological functions and uncovering their application potential.
Here, we proposed a novel human Kla site predictor (named PBertKla) through curating a reliable benchmark dataset with proper sample length and sequence identity threshold to train a protein large language model with optimal hyperparameters. Extensive experimental results consistently demonstrated that our model possessed robust human Kla site prediction ability, achieving an AUC (area under receiver operating characteristic curve) value of over 0.880 on the independent validation data. Feature visualization analysis further validated the effectiveness of in feature learning and representation from Kla sequences. Moreover, we benchmarked PBertKla against other cutting-edge models on an independent testing dataset from different sources, highlighting its superiority and transferability.
All results indicated that PBertKla excelled as an automatic predictor of human Kla sites, and it would advance the investigation of lactylation modifications and their significance in health and disease.
乳酰化是一种新发现的翻译后修饰类型,主要发生在组蛋白和非组蛋白的赖氨酸(K)残基上,对靶蛋白产生多种影响。研究表明,赖氨酸乳酰化(Kla)修饰在不同细胞中普遍存在,参与细胞功能和命运的决定,以及各种疾病的发生和发展。准确鉴定Kla位点对于阐明其生物学功能和揭示其应用潜力至关重要。
在此,我们通过精心策划一个具有适当样本长度和序列同一性阈值的可靠基准数据集,以训练具有最佳超参数的蛋白质大语言模型,提出了一种新型的人类Kla位点预测器(名为PBertKla)。广泛的实验结果一致表明,我们的模型具有强大的人类Kla位点预测能力,在独立验证数据上的AUC(受试者工作特征曲线下面积)值超过0.880。特征可视化分析进一步验证了从Kla序列进行特征学习和表示的有效性。此外,我们在来自不同来源的独立测试数据集上,将PBertKla与其他前沿模型进行了基准测试,突出了其优越性和可转移性。
所有结果表明,PBertKla作为人类Kla位点的自动预测器表现出色,它将推动对乳酰化修饰及其在健康和疾病中的意义的研究。