Elreify Heba M, El-Samie Fathi E Abd, Dessouky Moawad I, Torkey Hanaa, El-Khamy Said E, Shalaby Wafaa A
Department of Electronics and Electrical Communications Engineering, Faculty of Electronic Engineering, Menoufia University, Menouf 32952, Egypt.
Department of Information Technology, College of Computer and Information Sciences, Princess Nourah Bint Abdulrahman University, P.O. Box 84428, Riyadh, 11671, Saudi Arabia.
Sci Rep. 2025 Aug 25;15(1):31179. doi: 10.1038/s41598-025-13178-x.
Post-Translational Modifications (PTMs), particularly lysine 2-hydroxyisobutyrylation (Khib), represent critical regulatory mechanisms governing protein structure and function, with mounting evidence underscoring their important implications in cellular metabolism, transcriptional regulation, and pathological processes. Despite this significance, the experimental identification of Khib sites remains constrained by resource-intensive methodologies and the transient nature of these modifications. To overcome these limitations, we introduce HyLightKhib, a computational framework that leverages Light Gradient Boosting Machine architecture for accurate Khib site prediction. Our approach depends on a hybrid feature extraction strategy, integrating Evolutionary Scale Modeling (ESM-2) embeddings with comprehensive Composition, Transition, and Distribution (CTD) descriptors as well as curated amino acid physicochemical properties for fixed-length peptides of 43 amino acids. The proposed classifier demonstrated considerable performance over contemporary algorithms, including XGBoost and CatBoostimplementations through mutual information-based feature selection optimization. Cross-species validation on diverse organisms including, human, parasite , and rice achieved improved Area Under the Receiver Operating Characteristic Curve (AUC-ROC) scores of 0.893, 0.876, and 0.847, respectively, outperforming existing predictors, such as DeepKhib, and ResNetKhib. HyLightKhib represents an advancement in computational PTM prediction, providing enhanced predictive performance and valuable biological insights with direct implications for functional proteomics and PTM-targeted therapies.
翻译后修饰(PTMs),尤其是赖氨酸2-羟基异丁酰化(Khib),是调控蛋白质结构和功能的关键机制,越来越多的证据表明它们在细胞代谢、转录调控和病理过程中具有重要意义。尽管具有如此重要的意义,但Khib位点的实验鉴定仍然受到资源密集型方法以及这些修饰的瞬时性质的限制。为了克服这些限制,我们引入了HyLightKhib,这是一个计算框架,利用轻梯度提升机架构进行准确的Khib位点预测。我们的方法依赖于一种混合特征提取策略,将进化尺度建模(ESM-2)嵌入与综合的组成、转换和分布(CTD)描述符以及43个氨基酸的固定长度肽的精选氨基酸物理化学性质相结合。通过基于互信息的特征选择优化,所提出的分类器在包括XGBoost和CatBoost实现在内的当代算法上表现出了相当出色的性能。在包括人类、寄生虫和水稻在内的多种生物体上进行的跨物种验证分别实现了改进的受试者操作特征曲线下面积(AUC-ROC)分数,分别为0.893、0.876和0.847,优于现有的预测器,如DeepKhib和ResNetKhib。HyLightKhib代表了计算PTM预测方面的一项进展,提供了增强的预测性能和有价值的生物学见解,对功能蛋白质组学和PTM靶向治疗具有直接意义。