Kim Min-su, Hur Benjamin, Kim Sun
Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea.
Department of Computer Science and Engineering, Seoul National University, Seoul, Republic of Korea.
BMC Genomics. 2016 Jan 11;17 Suppl 1(Suppl 1):5. doi: 10.1186/s12864-015-2301-y.
RNA-editing is an important post-transcriptional RNA sequence modification performed by two catalytic enzymes, "ADAR"(A-to-I) and "APOBEC"(C-to-U). By utilizing high-throughput sequencing technologies, the biological function of RNA-editing has been actively investigated. Currently, RNA-editing is considered to be a key regulator that controls various cellular functions, such as protein activity, alternative splicing pattern of mRNA, and substitution of miRNA targeting site. DARNED, a public RDD database, reported that there are more than 300-thousands RNA-editing sites detected in human genome(hg19). Moreover, multiple studies suggested that RNA-editing events occur in highly specific conditions. According to DARNED, 97.62 % of registered editing sites were detected in a single tissue or in a specific condition, which also supports that the RNA-editing events occur condition-specifically. Since RNA-seq can capture the whole landscape of transcriptome, RNA-seq is widely used for RDD prediction. However, significant amounts of false positives or artefacts can be generated when detecting RNA-editing from RNA-seq. Since it is difficult to perform experimental validation at the whole-transcriptome scale, there should be a powerful computational tool to distinguish true RNA-editing events from artefacts.
We developed RDDpred, a Random Forest RDD classifier. RDDpred reports potentially true RNA-editing events from RNA-seq data. RDDpred was tested with two publicly available RNA-editing datasets and successfully reproduced RDDs reported in the two studies (90 %, 95 %) while rejecting false-discoveries (NPV: 75 %, 84 %).
RDDpred automatically compiles condition-specific training examples without experimental validations and then construct a RDD classifier. As far as we know, RDDpred is the very first machine-learning based automated pipeline for RDD prediction. We believe that RDDpred will be very useful and can contribute significantly to the study of condition-specific RNA-editing. RDDpred is available at http://biohealth.snu.ac.kr/software/RDDpred .
RNA编辑是一种重要的转录后RNA序列修饰,由两种催化酶“ADAR”(A到I)和“APOBEC”(C到U)执行。通过利用高通量测序技术,人们积极研究了RNA编辑的生物学功能。目前,RNA编辑被认为是控制各种细胞功能的关键调节因子,如蛋白质活性、mRNA的可变剪接模式以及miRNA靶位点的替换。DARNED是一个公共的RDD数据库,报告称在人类基因组(hg19)中检测到超过30万个RNA编辑位点。此外,多项研究表明RNA编辑事件发生在高度特定的条件下。根据DARNED的数据,97.62%的已注册编辑位点是在单个组织或特定条件下检测到的,这也支持了RNA编辑事件是条件特异性发生的。由于RNA测序可以捕获转录组的全貌,因此RNA测序被广泛用于RDD预测。然而,从RNA测序中检测RNA编辑时会产生大量假阳性或人为产物。由于在全转录组规模上进行实验验证很困难,因此需要一个强大的计算工具来区分真正的RNA编辑事件和人为产物。
我们开发了RDDpred,一种随机森林RDD分类器。RDDpred从RNA测序数据中报告潜在的真正RNA编辑事件。RDDpred用两个公开可用的RNA编辑数据集进行了测试,并成功重现了两项研究中报告的RDD(90%,95%),同时排除了错误发现(阴性预测值:75%,84%)。
RDDpred无需实验验证即可自动编译条件特异性训练示例,然后构建一个RDD分类器。据我们所知,RDDpred是第一个基于机器学习的用于RDD预测的自动化流程。我们相信RDDpred将非常有用,并能为条件特异性RNA编辑的研究做出重大贡献。RDDpred可在http://biohealth.snu.ac.kr/software/RDDpred获取。