Key Laboratory of Grain Information Processing and Control, Ministry of Education, Henan University of Technology, Zhengzhou, 450001, People's Republic of China.
College of Information Science and Engineering, Henan University of Technology, Zhengzhou, 450001, People's Republic of China.
Interdiscip Sci. 2020 Jun;12(2):193-203. doi: 10.1007/s12539-020-00362-y. Epub 2020 Mar 13.
Pseudouridine represents one of the most prevalent post-transcriptional RNA modifications. The identification of pseudouridine sites is an essential step toward understanding RNA functions, RNA structure stabilization, translation process, and RNA stability; however, high-throughput experimental techniques remain expensive and time-consuming in lab explorations and biochemical processes. Thus, how to develop an efficient pseudouridine site identification method based on machine learning is very important both in academic research and drug development. Motived by this, we present an effective layered ensemble model designated as iPseU-Layer for identification of RNA pseudouridine sites. The proposed iPseU-Layer approach is essentially based on three different machine learning layers including: feature selection layer, feature extraction and fusion layer, and prediction layer. The feature selection layer reduces the dimensionality, which can be regarded as a data pre-processing stage. The feature extraction and fusion layer utilizes an ensemble method which is implemented through various machine learning algorithms to generate some outputs. The prediction layer applies classic random forest to identify the final results. Furthermore, we systematically conduct the validation experiments using cross-validation tests and independent test with the current state-of-the-art models. The proposed iPseU-Layer provides a promising predictive performance in terms of sensitivity, specificity, accuracy and Matthews correlation coefficient. Collectively, these findings indicate that the framework of iPseU-Layer is a feasible and effective strategy for the prediction of RNA pseudouridine sites.
假尿嘧啶核苷是最普遍的转录后 RNA 修饰之一。假尿嘧啶核苷位点的鉴定是理解 RNA 功能、RNA 结构稳定性、翻译过程和 RNA 稳定性的重要步骤;然而,高通量实验技术在实验室探索和生化过程中仍然昂贵且耗时。因此,如何基于机器学习开发一种有效的假尿嘧啶核苷位点识别方法在学术研究和药物开发中都非常重要。受此启发,我们提出了一种有效的分层集成模型,称为 iPseU-Layer,用于识别 RNA 假尿嘧啶核苷位点。所提出的 iPseU-Layer 方法本质上基于三个不同的机器学习层,包括:特征选择层、特征提取和融合层以及预测层。特征选择层降低了维度,可以看作是数据预处理阶段。特征提取和融合层利用集成方法,通过各种机器学习算法生成一些输出。预测层应用经典随机森林来识别最终结果。此外,我们使用交叉验证测试和当前最先进的模型的独立测试系统地进行了验证实验。与当前最先进的模型相比,所提出的 iPseU-Layer 在灵敏度、特异性、准确性和马修斯相关系数方面提供了有前途的预测性能。综上所述,这些发现表明 iPseU-Layer 的框架是一种可行且有效的 RNA 假尿嘧啶核苷位点预测策略。