Ahmed Sajid, Hossain Zahid, Uddin Mahtab, Taherzadeh Ghazaleh, Sharma Alok, Shatabda Swakkhar, Dehzangi Abdollah
Department of Computer Science and Engineering, United International University, Dhaka, Bangladesh.
Department of Natural Science, United International University, Dhaka, Bangladesh.
Comput Struct Biotechnol J. 2020 Nov 12;18:3528-3538. doi: 10.1016/j.csbj.2020.10.032. eCollection 2020.
RNA modification is an essential step towards generation of new RNA structures. Such modification is potentially able to modify RNA function or its stability. Among different modifications, 5-Hydroxymethylcytosine (5hmC) modification of RNA exhibit significant potential for a series of biological processes. Understanding the distribution of 5hmC in RNA is essential to determine its biological functionality. Although conventional sequencing techniques allow broad identification of 5hmC, they are both time-consuming and resource-intensive. In this study, we propose a new computational tool called iRNA5hmC-PS to tackle this problem. To build iRNA5hmC-PS we extract a set of novel sequence-based features called Position-Specific Gapped k-mer (PSG k-mer) to obtain maximum sequential information. Our feature analysis shows that our proposed PSG k-mer features contain vital information for the identification of 5hmC sites. We also use a group-wise feature importance calculation strategy to select a small subset of features containing maximum discriminative information. Our experimental results demonstrate that iRNA5hmC-PS is able to enhance the prediction performance, dramatically. iRNA5hmC-PS achieves 78.3% prediction performance, which is 12.8% better than those reported in the previous studies. iRNA5hmC-PS is publicly available as an online tool at http://103.109.52.8:81/iRNA5hmC-PS. Its benchmark dataset, source codes, and documentation are available at https://github.com/zahid6454/iRNA5hmC-PS.
RNA修饰是产生新RNA结构的关键步骤。这种修饰有可能改变RNA的功能或其稳定性。在不同的修饰中,RNA的5-羟甲基胞嘧啶(5hmC)修饰在一系列生物过程中显示出巨大潜力。了解5hmC在RNA中的分布对于确定其生物学功能至关重要。尽管传统测序技术能够广泛鉴定5hmC,但它们既耗时又耗费资源。在本研究中,我们提出了一种名为iRNA5hmC-PS的新计算工具来解决这个问题。为了构建iRNA5hmC-PS,我们提取了一组名为位置特异性间隔k-mer(PSG k-mer)的新型基于序列的特征,以获取最大序列信息。我们的特征分析表明,我们提出的PSG k-mer特征包含用于识别5hmC位点的关键信息。我们还使用了一种分组特征重要性计算策略来选择一小部分包含最大判别信息的特征。我们的实验结果表明,iRNA5hmC-PS能够显著提高预测性能。iRNA5hmC-PS实现了78.3%的预测性能,比先前研究报告的性能高出12.8%。iRNA5hmC-PS作为在线工具可在http://103.109.52.8:81/iRNA5hmC-PS上公开获取。其基准数据集、源代码和文档可在https://github.com/zahid6454/iRNA5hmC-PS上获取。