Chai Di, Jia Cangzhi, Zheng Jia, Zou Quan, Li Fuyi
School of Science, Dalian Maritime University, Dalian 116026, China.
Yangtze Delta Region Institute (Quzhou), Quzhou, China.
Mol Ther Nucleic Acids. 2021 Oct 20;26:1027-1034. doi: 10.1016/j.omtn.2021.10.012. eCollection 2021 Dec 3.
5-Methylcytosine (m5C) is an important post-transcriptional modification that has been extensively found in multiple types of RNAs. Many studies have shown that m5C plays vital roles in many biological functions, such as RNA structure stability and metabolism. Computational approaches act as an efficient way to identify m5C sites from high-throughput RNA sequence data and help interpret the functional mechanism of this important modification. This study proposed a novel species-specific computational approach, Staem5, to accurately predict RNA m5C sites in and . Staem5 was developed by employing feature fusion tactics to leverage informatic sequence profiles, and a stacking ensemble learning framework combined five popular machine learning algorithms. Extensive benchmarking tests demonstrated that Staem5 outperformed state-of-the-art approaches in both cross-validation and independent tests. We provide the source code of Staem5, which is publicly available at https://github.com/Cxd-626/Staem5.git.
5-甲基胞嘧啶(m5C)是一种重要的转录后修饰,已在多种类型的RNA中广泛发现。许多研究表明,m5C在许多生物学功能中起着至关重要的作用,如RNA结构稳定性和代谢。计算方法是从高通量RNA序列数据中识别m5C位点并帮助解释这种重要修饰功能机制的有效途径。本研究提出了一种新颖的物种特异性计算方法Staem5,用于准确预测[]和[]中的RNA m5C位点。Staem5通过采用特征融合策略利用信息序列概况进行开发,并且一个堆叠集成学习框架结合了五种流行的机器学习算法。广泛的基准测试表明,在交叉验证和独立测试中,Staem5均优于现有方法。我们提供了Staem5的源代码,可在https://github.com/Cxd-626/Staem5.git上公开获取。