Key Laboratory of Biomedical Information Engineering of Ministry of Education, Biomedical Informatics & Genomics Center, School of Life Science and Technology, Zhongnan Hospital of Wuhan University, Wuhan 430071, China.
Department of Hematopathology, Zhongnan Hospital of Wuhan University, Wuhan 430071, China.
Bioinformatics. 2020 Sep 15;36(18):4739-4748. doi: 10.1093/bioinformatics/btaa567.
CircRNAs are an abundant class of non-coding RNAs with widespread, cell-/tissue-specific patterns. Previous work suggested that epigenetic features might be related to circRNA expression. However, the contribution of epigenetic changes to circRNA expression has not been investigated systematically. Here, we built a machine learning framework named CIRCScan, to predict circRNA expression in various cell lines based on the sequence and epigenetic features.
The predicted accuracy of the expression status models was high with area under the curve of receiver operating characteristic (ROC) values of 0.89-0.92 and the false-positive rates of 0.17-0.25. Predicted expressed circRNAs were further validated by RNA-seq data. The performance of expression-level prediction models was also good with normalized root-mean-square errors of 0.28-0.30 and Pearson's correlation coefficient r over 0.4 in all cell lines, along with Spearman's correlation coefficient ρ of 0.33-0.46. Noteworthy, H3K79me2 was highly ranked in modeling both circRNA expression status and levels across different cells. Further analysis in additional nine cell lines demonstrated a significant enrichment of H3K79me2 in circRNA flanking intron regions, supporting the potential involvement of H3K79me2 in circRNA expression regulation.
The CIRCScan assembler is freely available online for academic use at https://github.com/johnlcd/CIRCScan.
Supplementary data are available at Bioinformatics online.
CircRNAs 是一类丰富的非编码 RNA,具有广泛的、细胞/组织特异性的模式。先前的工作表明,表观遗传特征可能与 circRNA 的表达有关。然而,表观遗传变化对 circRNA 表达的贡献尚未被系统地研究。在这里,我们构建了一个名为 CIRCScan 的机器学习框架,基于序列和表观遗传特征来预测各种细胞系中的 circRNA 表达。
表达状态模型的预测准确性很高,ROC 曲线下面积(AUC)值为 0.89-0.92,假阳性率为 0.17-0.25。通过 RNA-seq 数据进一步验证了预测表达的 circRNA。在所有细胞系中,表达水平预测模型的性能也很好,归一化均方根误差(NRMSE)为 0.28-0.30,皮尔逊相关系数 r 超过 0.4,Spearman 相关系数 ρ 为 0.33-0.46。值得注意的是,H3K79me2 在建模不同细胞中的 circRNA 表达状态和水平方面得分很高。在另外九个细胞系中的进一步分析表明,H3K79me2 在 circRNA 侧翼内含子区域高度富集,支持 H3K79me2 参与 circRNA 表达调控的潜力。
CIRCScan 组装器可在 https://github.com/johnlcd/CIRCScan 上免费供学术使用。
补充数据可在生物信息学在线获得。