• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于残基的混合序贯编码机制与 XGBoost 改进的集成模型用于识别 5-羟甲基胞嘧啶修饰。

A hybrid residue based sequential encoding mechanism with XGBoost improved ensemble model for identifying 5-hydroxymethylcytosine modifications.

机构信息

Department of Computer Science, Abdul Wali Khan University, Mardan, Pakistan.

Department of Computer Science, Muslim Youth University, Islamabad, Pakistan.

出版信息

Sci Rep. 2024 Sep 6;14(1):20819. doi: 10.1038/s41598-024-71568-z.

DOI:10.1038/s41598-024-71568-z
PMID:39242695
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11379919/
Abstract

RNA modifications play an important role in actively controlling recently created formation in cellular regulation mechanisms, which link them to gene expression and protein. The RNA modifications have numerous alterations, presenting broad glimpses of RNA's operations and character. The modification process by the TET enzyme oxidation is the crucial change associated with cytosine hydroxymethylation. The effect of CR is an alteration in specific biochemical ways of the organism, such as gene expression and epigenetic alterations. Traditional laboratory systems that identify 5-hydroxymethylcytosine (5hmC) samples are expensive and time-consuming compared to other methods. To address this challenge, the paper proposed XGB5hmC, a machine learning algorithm based on a robust gradient boosting algorithm (XGBoost), with different residue based formulation methods to identify 5hmC samples. Their results were amalgamated, and six different frequency residue based encoding features were fused to form a hybrid vector in order to enhance model discrimination capabilities. In addition, the proposed model incorporates SHAP (Shapley Additive Explanations) based feature selection to demonstrate model interpretability by highlighting the high contributory features. Among the applied machine learning algorithms, the XGBoost ensemble model using the tenfold cross-validation test achieved improved results than existing state-of-the-art models. Our model reported an accuracy of 89.97%, sensitivity of 87.78%, specificity of 94.45%, F1-score of 0.8934%, and MCC of 0.8764%. This study highlights the potential to provide valuable insights for enhancing medical assessment and treatment protocols, representing a significant advancement in RNA modification analysis.

摘要

RNA 修饰在积极控制细胞调控机制中最近形成的结构方面发挥着重要作用,将它们与基因表达和蛋白质联系起来。RNA 修饰有许多变化,提供了 RNA 操作和特征的广泛了解。TET 酶氧化的修饰过程是与胞嘧啶羟甲基化相关的关键变化。CR 的影响是生物体特定生化方式的改变,如基因表达和表观遗传改变。与其他方法相比,传统的实验室系统在识别 5-羟甲基胞嘧啶(5hmC)样本方面既昂贵又耗时。为了解决这个挑战,本文提出了 XGB5hmC,这是一种基于稳健梯度提升算法(XGBoost)的机器学习算法,具有不同的基于残基的配方方法来识别 5hmC 样本。他们的结果被合并,并且融合了六种不同的基于残基的编码特征,以形成一个混合向量,从而增强模型的区分能力。此外,所提出的模型结合了基于 SHAP(Shapley Additive Explanations)的特征选择,通过突出高贡献特征来展示模型的可解释性。在所应用的机器学习算法中,XGBoost 集成模型使用十折交叉验证测试实现了优于现有最先进模型的改进结果。我们的模型报告的准确率为 89.97%,灵敏度为 87.78%,特异性为 94.45%,F1 得分为 0.8934%,MCC 得分为 0.8764%。这项研究为增强医疗评估和治疗方案提供了有价值的见解,代表了 RNA 修饰分析的重大进展。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b82a/11379919/ff927503913f/41598_2024_71568_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b82a/11379919/743a38b710a5/41598_2024_71568_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b82a/11379919/c5da519ec802/41598_2024_71568_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b82a/11379919/f5060a77d90d/41598_2024_71568_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b82a/11379919/42516610b956/41598_2024_71568_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b82a/11379919/5dc7d3612491/41598_2024_71568_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b82a/11379919/3e9cf572eedb/41598_2024_71568_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b82a/11379919/6f122518e852/41598_2024_71568_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b82a/11379919/ff927503913f/41598_2024_71568_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b82a/11379919/743a38b710a5/41598_2024_71568_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b82a/11379919/c5da519ec802/41598_2024_71568_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b82a/11379919/f5060a77d90d/41598_2024_71568_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b82a/11379919/42516610b956/41598_2024_71568_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b82a/11379919/5dc7d3612491/41598_2024_71568_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b82a/11379919/3e9cf572eedb/41598_2024_71568_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b82a/11379919/6f122518e852/41598_2024_71568_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b82a/11379919/ff927503913f/41598_2024_71568_Fig8_HTML.jpg

相似文献

1
A hybrid residue based sequential encoding mechanism with XGBoost improved ensemble model for identifying 5-hydroxymethylcytosine modifications.基于残基的混合序贯编码机制与 XGBoost 改进的集成模型用于识别 5-羟甲基胞嘧啶修饰。
Sci Rep. 2024 Sep 6;14(1):20819. doi: 10.1038/s41598-024-71568-z.
2
Sequence based model using deep neural network and hybrid features for identification of 5-hydroxymethylcytosine modification.基于序列的深度学习神经网络模型和混合特征用于 5-羟甲基胞嘧啶修饰的识别。
Sci Rep. 2024 Apr 20;14(1):9116. doi: 10.1038/s41598-024-59777-y.
3
iR5hmcSC: Identifying RNA 5-hydroxymethylcytosine with multiple features based on stacking learning.iR5hmcSC:基于堆叠学习利用多种特征识别RNA 5-羟甲基胞嘧啶
Comput Biol Chem. 2021 Dec;95:107583. doi: 10.1016/j.compbiolchem.2021.107583. Epub 2021 Sep 20.
4
im5C-DSCGA: A Proposed Hybrid Framework Based on Improved DenseNet and Attention Mechanisms for Identifying 5-methylcytosine Sites in Human RNA.im5C-DSCGA:一种基于改进的 DenseNet 和注意力机制的混合框架,用于识别人类 RNA 中的 5-甲基胞嘧啶位点。
Front Biosci (Landmark Ed). 2023 Dec 26;28(12):346. doi: 10.31083/j.fbl2812346.
5
TET enzymes and DNA hydroxymethylation in neural development and function - how critical are they?TET 酶与 DNA 羟甲基化在神经发育和功能中的作用——它们有多关键?
Genomics. 2014 Nov;104(5):334-40. doi: 10.1016/j.ygeno.2014.08.018. Epub 2014 Sep 6.
6
DIRECTION: a machine learning framework for predicting and characterizing DNA methylation and hydroxymethylation in mammalian genomes.一种用于预测和描述哺乳动物基因组中 DNA 甲基化和羟甲基化的机器学习框架。
Bioinformatics. 2017 Oct 1;33(19):2986-2994. doi: 10.1093/bioinformatics/btx316.
7
Genomic distribution and possible functions of DNA hydroxymethylation in the brain.大脑中DNA羟甲基化的基因组分布及可能功能
Genomics. 2014 Nov;104(5):341-6. doi: 10.1016/j.ygeno.2014.08.020. Epub 2014 Sep 7.
8
Prediction model of atrial fibrillation recurrence after Cox-Maze IV procedure in patients with chronic valvular disease and atrial fibrillation based on machine learning algorithm.基于机器学习算法的慢性瓣膜病合并心房颤动患者 Cox-Maze IV 术后心房颤动复发预测模型。
Zhong Nan Da Xue Xue Bao Yi Xue Ban. 2023 Jul 28;48(7):995-1007. doi: 10.11817/j.issn.1672-7347.2023.230018.
9
Machine learning for prediction of in-hospital mortality in lung cancer patients admitted to intensive care unit.机器学习在预测 ICU 收治的肺癌患者院内死亡率中的应用。
PLoS One. 2023 Jan 26;18(1):e0280606. doi: 10.1371/journal.pone.0280606. eCollection 2023.
10
Single base resolution analysis of 5-methylcytosine and 5-hydroxymethylcytosine by RRBS and TAB-RRBS.通过简化代表性亚硫酸氢盐测序(RRBS)和靶向亚硫酸氢盐测序(TAB-RRBS)对5-甲基胞嘧啶和5-羟甲基胞嘧啶进行单碱基分辨率分析。
Methods Mol Biol. 2015;1238:273-87. doi: 10.1007/978-1-4939-1804-1_14.

引用本文的文献

1
StackGlyEmbed: prediction of N-linked glycosylation sites using protein language models.StackGlyEmbed:使用蛋白质语言模型预测N-糖基化位点
Bioinform Adv. 2025 Jun 28;5(1):vbaf146. doi: 10.1093/bioadv/vbaf146. eCollection 2025.
2
Integrated approach of extreme learning machines and locally weighted linear regression for improved discharge coefficient prediction.用于改进流量系数预测的极限学习机与局部加权线性回归的集成方法。
Sci Rep. 2025 Jul 1;15(1):21761. doi: 10.1038/s41598-025-03812-z.
3
OptimDase: An Algorithm for Predicting DNA Binding Sites with Combined Feature Encoding.

本文引用的文献

1
A Computational Predictor for Accurate Identification of Tumor Homing Peptides by Integrating Sequential and Deep BiLSTM Features.一种通过整合序列和深度 BiLSTM 特征来准确识别肿瘤归巢肽的计算预测器。
Interdiscip Sci. 2024 Jun;16(2):503-518. doi: 10.1007/s12539-024-00628-9. Epub 2024 May 11.
2
DeepAVP-TPPred: identification of antiviral peptides using transformed image-based localized descriptors and binary tree growth algorithm.DeepAVP-TPPred:使用变换图像的局部描述符和二叉树生长算法鉴定抗病毒肽。
Bioinformatics. 2024 May 2;40(5). doi: 10.1093/bioinformatics/btae305.
3
An explainable stacking-based approach for accelerating the prediction of antidiabetic peptides.
OptimDase:一种采用组合特征编码预测DNA结合位点的算法。
Interdiscip Sci. 2025 Jun 10. doi: 10.1007/s12539-025-00704-8.
4
pNPs-CapsNet: Predicting Neuropeptides Using Protein Language Models and FastText Encoding-Based Weighted Multi-View Feature Integration with Deep Capsule Neural Network.pNPs-CapsNet:使用蛋白质语言模型和基于FastText编码的加权多视图特征集成与深度胶囊神经网络预测神经肽
ACS Omega. 2025 Mar 18;10(12):12403-12416. doi: 10.1021/acsomega.4c11449. eCollection 2025 Apr 1.
5
N6-methyladenine identification using deep learning and discriminative feature integration.利用深度学习和判别特征整合进行N6-甲基腺嘌呤识别
BMC Med Genomics. 2025 Mar 29;18(1):58. doi: 10.1186/s12920-025-02131-6.
6
Deep-ProBind: binding protein prediction with transformer-based deep learning model.深度蛋白质结合预测:基于Transformer的深度学习模型进行结合蛋白预测。
BMC Bioinformatics. 2025 Mar 22;26(1):88. doi: 10.1186/s12859-025-06101-8.
7
Classification of pulmonary diseases from chest radiographs using deep transfer learning.使用深度迁移学习从胸部X光片对肺部疾病进行分类。
PLoS One. 2025 Mar 17;20(3):e0316929. doi: 10.1371/journal.pone.0316929. eCollection 2025.
8
Design and implementation of an intelligent sports management system (ISMS) using wireless sensor networks.基于无线传感器网络的智能体育管理系统(ISMS)的设计与实现。
PeerJ Comput Sci. 2025 Jan 31;11:e2637. doi: 10.7717/peerj-cs.2637. eCollection 2025.
9
XGBoost-enhanced ensemble model using discriminative hybrid features for the prediction of sumoylation sites.使用判别性混合特征的XGBoost增强集成模型用于预测SUMO化位点。
BioData Min. 2025 Feb 3;18(1):12. doi: 10.1186/s13040-024-00415-8.
10
pACP-HybDeep: predicting anticancer peptides using binary tree growth based transformer and structural feature encoding with deep-hybrid learning.pACP-HybDeep:基于二叉树生长的变压器和深度混合学习的结构特征编码预测抗癌肽
Sci Rep. 2025 Jan 2;15(1):565. doi: 10.1038/s41598-024-84146-0.
一种基于可解释堆叠的方法,用于加速抗糖尿病肽的预测。
Anal Biochem. 2024 Aug;691:115546. doi: 10.1016/j.ab.2024.115546. Epub 2024 Apr 25.
4
Sequence based model using deep neural network and hybrid features for identification of 5-hydroxymethylcytosine modification.基于序列的深度学习神经网络模型和混合特征用于 5-羟甲基胞嘧啶修饰的识别。
Sci Rep. 2024 Apr 20;14(1):9116. doi: 10.1038/s41598-024-59777-y.
5
iAFPs-Mv-BiTCN: Predicting antifungal peptides using self-attention transformer embedding and transform evolutionary based multi-view features with bidirectional temporal convolutional networks.iAFPs-Mv-BiTCN:使用自注意力转换器嵌入和基于进化的多视图特征与双向时间卷积网络预测抗真菌肽。
Artif Intell Med. 2024 May;151:102860. doi: 10.1016/j.artmed.2024.102860. Epub 2024 Mar 26.
6
Deepstacked-AVPs: predicting antiviral peptides using tri-segment evolutionary profile and word embedding based multi-perspective features with deep stacking model.深度堆叠 AVPs:使用三片段进化特征和基于单词嵌入的多视角特征与深度堆叠模型预测抗病毒肽。
BMC Bioinformatics. 2024 Mar 7;25(1):102. doi: 10.1186/s12859-024-05726-5.
7
Rewiring of RNA methylation by the oncometabolite fumarate in renal cell carcinoma.肾细胞癌中癌代谢物富马酸酯对RNA甲基化的重编程
NAR Cancer. 2024 Feb 7;6(1):zcae004. doi: 10.1093/narcan/zcae004. eCollection 2024 Mar.
8
Enhancing Sumoylation Site Prediction: A Deep Neural Network with Discriminative Features.增强SUMO化位点预测:具有判别特征的深度神经网络
Life (Basel). 2023 Nov 2;13(11):2153. doi: 10.3390/life13112153.
9
AIPs-SnTCN: Predicting Anti-Inflammatory Peptides Using fastText and Transformer Encoder-Based Hybrid Word Embedding with Self-Normalized Temporal Convolutional Networks.AIPs-SnTCN:使用基于fastText和基于Transformer编码器的混合词嵌入与自归一化时间卷积网络预测抗炎肽
J Chem Inf Model. 2023 Nov 13;63(21):6537-6554. doi: 10.1021/acs.jcim.3c01563. Epub 2023 Oct 31.
10
The functions and mechanisms of post-translational modification in protein regulators of RNA methylation: Current status and future perspectives.RNA 甲基化蛋白调控因子的翻译后修饰的功能和机制:现状和未来展望。
Int J Biol Macromol. 2023 Dec 31;253(Pt 2):126773. doi: 10.1016/j.ijbiomac.2023.126773. Epub 2023 Sep 9.