Tac Huseyin Avni, Koroglu Mustafa, Sezerman Ugur
Department of Biostatistics and Bioinformatics, Acibadem University, Istanbul, Turkey.
Department of Molecular Biology and Genetics, Acibadem University, Istanbul, Turkey.
Funct Integr Genomics. 2021 Nov;21(5-6):633-643. doi: 10.1007/s10142-021-00805-9. Epub 2021 Sep 16.
Adenosine to inosine (A-to-I) editing in RNA is involved in various biological processes like gene expression, alternative splicing, and mRNA degradation associated with carcinogenesis and various human diseases. Therefore, accurate identification of RNA editing sites in transcriptome is valuable for research and medicine. RNA-seq is very useful for the detection of RNA editing events in condition-specific cells. However, computational analysis methods of RNA-seq data have considerable false-positive risks due to mapping errors. In this study, we developed a simple machine learning method using support vector machines to train sequence and structure information derived from flanking sequences of experimentally verified A-to-I editing sites to predict new A-to-I editing sites in RNA. The highest performance results were obtained by the model that utilizes the composition of the triplet sequence elements in the flanking regions of the in A-to-I editing sites. Using this model, the SVM classifier also showed high performance on experimentally verified data providing a sensitivity of 92.8%, specificity of 77.1%, and accuracy of 90.2%. To compare the predictive capacity of our method with other classifiers that use sequence information, we have used validated human A-to-I RNA editing sites by Sanger sequencing. Out of 58 validated editing sites, our method recognized 53 of them correctly with an accuracy of 91.4% outperforming other classifiers. As to our knowledge, this is the first case of utilization of the composition of the triplet sequence elements neighboring A-to-I editing sites for the prediction of new A-to-I editing sites in RNA. The methodology is very easy to perform and computationally low demanding making it a convenient and valuable choice for facilities with low sources. To facilitate the usage of the method publicly, we developed an open-source program called RDDSVM to perform prediction on candidate A-to-I RNA editing sites using support vector machines.
RNA中的腺苷到肌苷(A-to-I)编辑参与了各种生物过程,如基因表达、可变剪接以及与致癌作用和各种人类疾病相关的mRNA降解。因此,准确识别转录组中的RNA编辑位点对研究和医学具有重要价值。RNA测序对于检测特定条件下细胞中的RNA编辑事件非常有用。然而,由于映射错误,RNA测序数据的计算分析方法存在相当大的假阳性风险。在本研究中,我们开发了一种简单的机器学习方法,使用支持向量机来训练从经实验验证的A-to-I编辑位点侧翼序列中获得的序列和结构信息,以预测RNA中的新A-to-I编辑位点。利用A-to-I编辑位点侧翼区域三联体序列元件组成的模型获得了最高的性能结果。使用该模型,支持向量机分类器在经实验验证的数据上也表现出高性能,灵敏度为92.8%,特异性为77.1%,准确率为90.2%。为了将我们方法的预测能力与其他使用序列信息的分类器进行比较,我们使用了经桑格测序验证的人类A-to-I RNA编辑位点。在58个经过验证的编辑位点中,我们的方法正确识别了其中53个,准确率为91.4%,优于其他分类器。据我们所知,这是首次利用A-to-I编辑位点附近三联体序列元件的组成来预测RNA中的新A-to-I编辑位点。该方法非常易于实施,计算要求低,使其成为资源有限的机构的便利且有价值的选择。为了便于公众使用该方法,我们开发了一个名为RDDSVM的开源程序,用于使用支持向量机对候选A-to-I RNA编辑位点进行预测。