PseU-ST：一种用于识别RNA假尿苷位点的新型堆叠集成学习方法。

PseU-ST: A new stacked ensemble-learning method for identifying RNA pseudouridine sites.

作者信息

Zhang Xinru, Wang Shutao, Xie Lina, Zhu Yuhui

机构信息

Department of Pharmacy, The Second Hospital of Jilin University, Changchun, China.

出版信息

Front Genet. 2023 Jan 19;14:1121694. doi: 10.3389/fgene.2023.1121694. eCollection 2023.

DOI:10.3389/fgene.2023.1121694

PMID:36741328

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9892456/

Abstract

Pseudouridine (Ψ) is one of the most abundant RNA modifications found in a variety of RNA types, and it plays a significant role in many biological processes. The key to studying the various biochemical functions and mechanisms of Ψ is to identify the Ψ sites. However, identifying Ψ sites using experimental methods is time-consuming and expensive. Therefore, it is necessary to develop computational methods that can accurately predict Ψ sites based on RNA sequence information. In this study, we proposed a new model called PseU-ST to identify Ψ sites in , , and . We selected the best six encoding schemes and four machine learning algorithms based on a comprehensive test of almost all of the RNA sequence encoding schemes available in the iLearnPlus software package, and selected the optimal features for each encoding scheme using chi-square and incremental feature selection algorithms. Then, we selected the optimal feature combination and the best base-classifier combination for each species through an extensive performance comparison and employed a stacking strategy to build the predictive model. The results demonstrated that PseU-ST achieved better prediction performance compared with other existing models. The PseU-ST accuracy scores were 93.64%, 87.74%, and 89.64% on H_990, S_628, and M_944, respectively, representing increments of 13.94%, 6.05%, and 0.26%, respectively, higher than the best existing methods on the same benchmark training datasets. The data indicate that PseU-ST is a very competitive prediction model for identifying RNA Ψ sites in , , and . In addition, we found that the Position-specific trinucleotide propensity based on single strand (PSTNPss) and Position-specific of three nucleotides (PS3) features play an important role in Ψ site identification. The source code for PseU-ST and the data are obtainable in our GitHub repository (https://github.com/jluzhangxinrubio/PseU-ST).

摘要

假尿苷（Ψ）是在多种RNA类型中发现的最丰富的RNA修饰之一，它在许多生物过程中发挥着重要作用。研究Ψ的各种生化功能和机制的关键是识别Ψ位点。然而，使用实验方法识别Ψ位点既耗时又昂贵。因此，有必要开发能够基于RNA序列信息准确预测Ψ位点的计算方法。在本研究中，我们提出了一种名为PseU-ST的新模型，用于识别H_990、S_628和M_944中的Ψ位点。我们基于对iLearnPlus软件包中几乎所有可用RNA序列编码方案的全面测试，选择了最佳的六种编码方案和四种机器学习算法，并使用卡方和增量特征选择算法为每种编码方案选择了最优特征。然后，我们通过广泛的性能比较为每个物种选择了最优特征组合和最佳基分类器组合，并采用堆叠策略构建预测模型。结果表明，与其他现有模型相比，PseU-ST实现了更好的预测性能。PseU-ST在H_990、S_628和M_944上的准确率分别为93.64%、87.74%和89.64%，分别比相同基准训练数据集上的最佳现有方法高出13.94%、6.05%和0.26%。数据表明，PseU-ST是用于识别H_990、S_628和M_944中RNA Ψ位点的极具竞争力的预测模型。此外，我们发现基于单链的位置特异性三核苷酸倾向（PSTNPss）和三个核苷酸的位置特异性（PS3）特征在Ψ位点识别中起重要作用。PseU-ST的源代码和数据可在我们的GitHub仓库（https://github.com/jluzhangxinrubio/PseU-ST）中获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dbd8/9892456/16e987859f88/fgene-14-1121694-g001.jpg

相似文献

PseU-ST: A new stacked ensemble-learning method for identifying RNA pseudouridine sites.PseU-ST：一种用于识别RNA假尿苷位点的新型堆叠集成学习方法。

Front Genet. 2023 Jan 19;14:1121694. doi: 10.3389/fgene.2023.1121694. eCollection 2023.

PseU-KeMRF: A Novel Method for Identifying RNA Pseudouridine Sites.PseU-KeMRF：一种识别 RNA 假尿嘧啶位点的新方法。

IEEE/ACM Trans Comput Biol Bioinform. 2024 Sep-Oct;21(5):1423-1435. doi: 10.1109/TCBB.2024.3389094. Epub 2024 Oct 9.

XG-PseU: an eXtreme Gradient Boosting based method for identifying pseudouridine sites.XG-PseU：一种基于极端梯度提升的假尿嘧啶位点识别方法。

Mol Genet Genomics. 2020 Jan;295(1):13-21. doi: 10.1007/s00438-019-01600-9. Epub 2019 Aug 7.

Is There Any Sequence Feature in the RNA Pseudouridine Modification Prediction Problem?RNA假尿苷修饰预测问题中是否存在任何序列特征？

Mol Ther Nucleic Acids. 2020 Mar 6;19:293-303. doi: 10.1016/j.omtn.2019.11.014. Epub 2019 Nov 21.

A Feature Fusion Predictor for RNA Pseudouridine Sites with Particle Swarm Optimizer Based Feature Selection and Ensemble Learning Approach.基于粒子群优化算法特征选择和集成学习方法的 RNA 假尿嘧啶位点特征融合预测器。

Curr Issues Mol Biol. 2021 Nov 1;43(3):1844-1858. doi: 10.3390/cimb43030129.

PseUI: Pseudouridine sites identification based on RNA sequence information.PseUI：基于 RNA 序列信息的假尿嘧啶核苷位点鉴定。

BMC Bioinformatics. 2018 Aug 29;19(1):306. doi: 10.1186/s12859-018-2321-0.

RF-PseU: A Random Forest Predictor for RNA Pseudouridine Sites.RF-PseU：一种用于RNA假尿苷位点的随机森林预测器。

Front Bioeng Biotechnol. 2020 Feb 26;8:134. doi: 10.3389/fbioe.2020.00134. eCollection 2020.

Porpoise: a new approach for accurate prediction of RNA pseudouridine sites.海豚：一种准确预测 RNA 假尿嘧啶位点的新方法。

Brief Bioinform. 2021 Nov 5;22(6). doi: 10.1093/bib/bbab245.

PseUdeep: RNA Pseudouridine Site Identification with Deep Learning Algorithm.PseUdeep：使用深度学习算法进行RNA假尿苷位点识别

Front Genet. 2021 Nov 18;12:773882. doi: 10.3389/fgene.2021.773882. eCollection 2021.

MU-PseUDeep: A deep learning method for prediction of pseudouridine sites.MU-PseUDeep：一种预测假尿苷位点的深度学习方法。

Comput Struct Biotechnol J. 2020 Jul 15;18:1877-1883. doi: 10.1016/j.csbj.2020.07.010. eCollection 2020.

引用本文的文献

YModPred: an interpretable prediction method for multi-type RNA modification sites in S. cerevisiae based on deep learning.YModPred：一种基于深度学习的用于酿酒酵母中多类型RNA修饰位点的可解释预测方法。

BMC Biol. 2025 Aug 29;23(1):272. doi: 10.1186/s12915-025-02372-y.

Meta-2OM: A multi-classifier meta-model for the accurate prediction of RNA 2'-O-methylation sites in human RNA.Meta-2OM：一种用于准确预测人类 RNA 2'-O-甲基化位点的多分类器元模型。

PLoS One. 2024 Jun 26;19(6):e0305406. doi: 10.1371/journal.pone.0305406. eCollection 2024.

Exploring the Potential of GANs in Biological Sequence Analysis.探索生成对抗网络在生物序列分析中的潜力。

Biology (Basel). 2023 Jun 14;12(6):854. doi: 10.3390/biology12060854.

本文引用的文献

Feature Selection Techniques for a Machine Learning Model to Detect Autonomic Dysreflexia.用于检测自主神经反射异常的机器学习模型的特征选择技术

Front Neuroinform. 2022 Aug 10;16:901428. doi: 10.3389/fninf.2022.901428. eCollection 2022.

Machine learning techniques for identification of carcinogenic mutations, which cause breast adenocarcinoma.机器学习技术用于鉴定致癌突变，这些突变导致乳腺腺癌。

Sci Rep. 2022 Jul 11;12(1):11738. doi: 10.1038/s41598-022-15533-8.

Clinical decision support system for early detection of Alzheimer's disease using an enhanced gradient boosted decision tree classifier.基于增强型梯度提升决策树分类器的阿尔茨海默病早期检测临床决策支持系统。

Health Informatics J. 2022 Jan-Mar;28(1):14604582221082868. doi: 10.1177/14604582221082868.

Protein-DNA/RNA interactions: Machine intelligence tools and approaches in the era of artificial intelligence and big data.蛋白质 - DNA/RNA 相互作用：人工智能与大数据时代的机器智能工具及方法

Proteomics. 2022 Apr;22(8):e2100197. doi: 10.1002/pmic.202100197. Epub 2022 Feb 13.

CRBPDL: Identification of circRNA-RBP interaction sites using an ensemble neural network approach.环状 RNA 与 RNA 结合蛋白相互作用位点的鉴定：基于集成神经网络方法。

PLoS Comput Biol. 2022 Jan 20;18(1):e1009798. doi: 10.1371/journal.pcbi.1009798. eCollection 2022 Jan.

Curr Issues Mol Biol. 2021 Nov 1;43(3):1844-1858. doi: 10.3390/cimb43030129.

PseUdeep: RNA Pseudouridine Site Identification with Deep Learning Algorithm.PseUdeep：使用深度学习算法进行RNA假尿苷位点识别

Front Genet. 2021 Nov 18;12:773882. doi: 10.3389/fgene.2021.773882. eCollection 2021.

Machine learning applications in RNA modification sites prediction.机器学习在RNA修饰位点预测中的应用。

Comput Struct Biotechnol J. 2021 Sep 29;19:5510-5524. doi: 10.1016/j.csbj.2021.09.025. eCollection 2021.

A ten-genes-based diagnostic signature for atherosclerosis.基于十个基因的动脉粥样硬化诊断标志物。

BMC Cardiovasc Disord. 2021 Oct 23;21(1):513. doi: 10.1186/s12872-021-02323-9.

webTWAS: a resource for disease candidate susceptibility genes identified by transcriptome-wide association study.webtWAS：基于转录组关联研究的疾病候选易感性基因资源。

Nucleic Acids Res. 2022 Jan 7;50(D1):D1123-D1130. doi: 10.1093/nar/gkab957.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

PseU-ST：一种用于识别RNA假尿苷位点的新型堆叠集成学习方法。

PseU-ST: A new stacked ensemble-learning method for identifying RNA pseudouridine sites.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献