Suppr超能文献

通过整合序列和表观遗传特征来模拟 circRNA 表达模式,表明 H3K79me2 可能参与 circRNA 的表达。

Modeling circRNA expression pattern with integrated sequence and epigenetic features demonstrates the potential involvement of H3K79me2 in circRNA expression.

机构信息

Key Laboratory of Biomedical Information Engineering of Ministry of Education, Biomedical Informatics & Genomics Center, School of Life Science and Technology, Zhongnan Hospital of Wuhan University, Wuhan 430071, China.

Department of Hematopathology, Zhongnan Hospital of Wuhan University, Wuhan 430071, China.

出版信息

Bioinformatics. 2020 Sep 15;36(18):4739-4748. doi: 10.1093/bioinformatics/btaa567.

Abstract

MOTIVATION

CircRNAs are an abundant class of non-coding RNAs with widespread, cell-/tissue-specific patterns. Previous work suggested that epigenetic features might be related to circRNA expression. However, the contribution of epigenetic changes to circRNA expression has not been investigated systematically. Here, we built a machine learning framework named CIRCScan, to predict circRNA expression in various cell lines based on the sequence and epigenetic features.

RESULTS

The predicted accuracy of the expression status models was high with area under the curve of receiver operating characteristic (ROC) values of 0.89-0.92 and the false-positive rates of 0.17-0.25. Predicted expressed circRNAs were further validated by RNA-seq data. The performance of expression-level prediction models was also good with normalized root-mean-square errors of 0.28-0.30 and Pearson's correlation coefficient r over 0.4 in all cell lines, along with Spearman's correlation coefficient ρ of 0.33-0.46. Noteworthy, H3K79me2 was highly ranked in modeling both circRNA expression status and levels across different cells. Further analysis in additional nine cell lines demonstrated a significant enrichment of H3K79me2 in circRNA flanking intron regions, supporting the potential involvement of H3K79me2 in circRNA expression regulation.

AVAILABILITY AND IMPLEMENTATION

The CIRCScan assembler is freely available online for academic use at https://github.com/johnlcd/CIRCScan.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

CircRNAs 是一类丰富的非编码 RNA,具有广泛的、细胞/组织特异性的模式。先前的工作表明,表观遗传特征可能与 circRNA 的表达有关。然而,表观遗传变化对 circRNA 表达的贡献尚未被系统地研究。在这里,我们构建了一个名为 CIRCScan 的机器学习框架,基于序列和表观遗传特征来预测各种细胞系中的 circRNA 表达。

结果

表达状态模型的预测准确性很高,ROC 曲线下面积(AUC)值为 0.89-0.92,假阳性率为 0.17-0.25。通过 RNA-seq 数据进一步验证了预测表达的 circRNA。在所有细胞系中,表达水平预测模型的性能也很好,归一化均方根误差(NRMSE)为 0.28-0.30,皮尔逊相关系数 r 超过 0.4,Spearman 相关系数 ρ 为 0.33-0.46。值得注意的是,H3K79me2 在建模不同细胞中的 circRNA 表达状态和水平方面得分很高。在另外九个细胞系中的进一步分析表明,H3K79me2 在 circRNA 侧翼内含子区域高度富集,支持 H3K79me2 参与 circRNA 表达调控的潜力。

可用性和实现

CIRCScan 组装器可在 https://github.com/johnlcd/CIRCScan 上免费供学术使用。

补充信息

补充数据可在生物信息学在线获得。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验