Department of Computer Science, School of Systems and Technology, University of Management and Technology, Lahore, 54770, Pakistan.
Anal Biochem. 2023 Sep 1;676:115247. doi: 10.1016/j.ab.2023.115247. Epub 2023 Jul 10.
Pseudouridine (ψ) is reported to occur frequently in all types of RNA. This uridine modification has been shown to be essential for processes such as RNA stability and stress response. Also, it is linked to a few human diseases, such as prostate cancer, anemia, etc. A few laboratory techniques, such as Pseudo-seq and N3-CMC-enriched Pseudouridine sequencing (CeU-Seq) are used for detecting ψ sites. However, these are laborious and drawn-out methods. The convenience of sequencing data has enabled the development of computationally intelligent models for improving ψ site identification methods. The proposed work provides a prediction model for the identification of ψ sites through popular ensemble methods such as stacking, bagging, and boosting. Features were obtained through a novel feature extraction mechanism with the assimilation of statistical moments, which were used to train ensemble models. The cross-validation test and independent set test were used to evaluate the precision of the trained models. The proposed model outperformed the preexisting predictors and revealed 87% accuracy, 0.90 specificity, 0.85 sensitivity, and a 0.75 Matthews correlation coefficient. A web server has been built and is available publicly for the researchers at https://taseersuleman-y-test-pseu-pred-c2wmtj.streamlit.app/.
假尿嘧啶核苷 (ψ) 据报道广泛存在于各种类型的 RNA 中。这种尿嘧啶修饰被证明对 RNA 稳定性和应激反应等过程至关重要。此外,它与一些人类疾病有关,如前列腺癌、贫血等。一些实验室技术,如 Pseudo-seq 和 N3-CMC 富集假尿嘧啶测序 (CeU-Seq),用于检测 ψ 位点。然而,这些方法既繁琐又耗时。测序数据的便利性使得开发出了基于计算智能的模型来改进 ψ 位点的识别方法。本研究提出了一种通过流行的集成方法(如堆叠、装袋和提升)识别 ψ 位点的预测模型。通过一种新的特征提取机制,结合统计矩,获得了特征,并用于训练集成模型。使用交叉验证测试和独立集测试来评估训练模型的精度。与现有的预测器相比,所提出的模型表现更好,准确率为 87%,特异性为 0.90,敏感性为 0.85,马修斯相关系数为 0.75。已经建立了一个网络服务器,并在 https://taseersuleman-y-test-pseu-pred-c2wmtj.streamlit.app/ 上为研究人员提供了公开访问。