School of Communications and Electronics, Jiangxi Science and Technology Normal University, Nanchang 330003, P. R. China.
J Bioinform Comput Biol. 2022 Aug;20(4):2250017. doi: 10.1142/S0219720022500172. Epub 2022 Aug 3.
RNA 5-hydroxymethylcytosine (5 hmC) is an important RNA modification, which plays vital role in several biological processes. Currently, it is a hot topic to identify 5 hmC sites due to its benefit in understanding its biological functions. Therefore, in this study, we developed a predictor called iRNA5 hmC-HOC, which is based on a high-order correlation information method to identify 5 hmC sites. To build the model, 22 different classes of dinucleotide physicochemical (PC) properties were employed to represent RNA sequences, and the least absolute shrinkage and selection operator (LASSO) algorithm was adopted to select the most discriminative features. In the jackknife test, the proposed method achieved 89.80% classification accuracy based on support vector machine (SVM). As compared with the state-of-the-art predictors, our proposed method has significant improvement on the classification performance. It indicates that the proposed method might be a promising tool in identifying RNA 5 hmC modification sites. The dataset and source codes are available at https://figshare.com/articles/online_resource/iRNA5hmC-HOC/15177450.
RNA 5-羟甲基胞嘧啶(5 hmC)是一种重要的 RNA 修饰,在多种生物学过程中发挥着至关重要的作用。由于其在理解生物学功能方面的益处,目前鉴定 5 hmC 位点是一个热门话题。因此,在这项研究中,我们开发了一种名为 iRNA5 hmC-HOC 的预测器,它基于高阶相关信息方法来识别 5 hmC 位点。为了构建模型,我们使用了 22 种不同类别的二核苷酸理化(PC)特性来表示 RNA 序列,并采用最小绝对收缩和选择算子(LASSO)算法选择最具区分性的特征。在 Jackknife 测试中,基于支持向量机(SVM)的方法实现了 89.80%的分类准确率。与最先进的预测器相比,我们提出的方法在分类性能上有显著提高。这表明该方法可能是识别 RNA 5 hmC 修饰位点的一种有前途的工具。数据集和源代码可在 https://figshare.com/articles/online_resource/iRNA5hmC-HOC/15177450 上获得。