Department of Computer Science, Abdul Wali Khan University, Mardan, Pakistan.
Department of Computer Science, Muslim Youth University, Islamabad, Pakistan.
Sci Rep. 2024 Sep 6;14(1):20819. doi: 10.1038/s41598-024-71568-z.
RNA modifications play an important role in actively controlling recently created formation in cellular regulation mechanisms, which link them to gene expression and protein. The RNA modifications have numerous alterations, presenting broad glimpses of RNA's operations and character. The modification process by the TET enzyme oxidation is the crucial change associated with cytosine hydroxymethylation. The effect of CR is an alteration in specific biochemical ways of the organism, such as gene expression and epigenetic alterations. Traditional laboratory systems that identify 5-hydroxymethylcytosine (5hmC) samples are expensive and time-consuming compared to other methods. To address this challenge, the paper proposed XGB5hmC, a machine learning algorithm based on a robust gradient boosting algorithm (XGBoost), with different residue based formulation methods to identify 5hmC samples. Their results were amalgamated, and six different frequency residue based encoding features were fused to form a hybrid vector in order to enhance model discrimination capabilities. In addition, the proposed model incorporates SHAP (Shapley Additive Explanations) based feature selection to demonstrate model interpretability by highlighting the high contributory features. Among the applied machine learning algorithms, the XGBoost ensemble model using the tenfold cross-validation test achieved improved results than existing state-of-the-art models. Our model reported an accuracy of 89.97%, sensitivity of 87.78%, specificity of 94.45%, F1-score of 0.8934%, and MCC of 0.8764%. This study highlights the potential to provide valuable insights for enhancing medical assessment and treatment protocols, representing a significant advancement in RNA modification analysis.
RNA 修饰在积极控制细胞调控机制中最近形成的结构方面发挥着重要作用,将它们与基因表达和蛋白质联系起来。RNA 修饰有许多变化,提供了 RNA 操作和特征的广泛了解。TET 酶氧化的修饰过程是与胞嘧啶羟甲基化相关的关键变化。CR 的影响是生物体特定生化方式的改变,如基因表达和表观遗传改变。与其他方法相比,传统的实验室系统在识别 5-羟甲基胞嘧啶(5hmC)样本方面既昂贵又耗时。为了解决这个挑战,本文提出了 XGB5hmC,这是一种基于稳健梯度提升算法(XGBoost)的机器学习算法,具有不同的基于残基的配方方法来识别 5hmC 样本。他们的结果被合并,并且融合了六种不同的基于残基的编码特征,以形成一个混合向量,从而增强模型的区分能力。此外,所提出的模型结合了基于 SHAP(Shapley Additive Explanations)的特征选择,通过突出高贡献特征来展示模型的可解释性。在所应用的机器学习算法中,XGBoost 集成模型使用十折交叉验证测试实现了优于现有最先进模型的改进结果。我们的模型报告的准确率为 89.97%,灵敏度为 87.78%,特异性为 94.45%,F1 得分为 0.8934%,MCC 得分为 0.8764%。这项研究为增强医疗评估和治疗方案提供了有价值的见解,代表了 RNA 修饰分析的重大进展。