Hansen Thomas B
Department of Molecular Biology and Genetics and Interdisciplinary Nanoscience Center, Aarhus University, Aarhus, Denmark.
Front Cell Dev Biol. 2018 Mar 5;6:20. doi: 10.3389/fcell.2018.00020. eCollection 2018.
Non-coding RNA is an interesting class of gene regulators with diverse functionalities. One large subgroup of non-coding RNAs is the recently discovered class of circular RNAs (circRNAs). CircRNAs are conserved and expressed in a tissue and developmental specific manner, although for the vast majority, the functional relevance remains unclear. To identify and quantify circRNAs expression, several bioinformatic pipelines have been developed to assess the catalog of circRNAs in any given total RNA sequencing dataset. We recently compared five different algorithms for circRNA detection, but here this analysis is extended to 11 algorithms. By comparing the number of circRNAs discovered and their respective sensitivity to RNaseR digestion, the sensitivity and specificity of each algorithm are evaluated. Moreover, the ability to predict circRNA, i.e., circRNAs not derived from annotated splice sites, is also determined as well as the effect of eliminating low quality and adaptor-containing reads prior to circRNA prediction. Finally, and most importantly, all possible pair-wise combinations of algorithms are tested and guidelines for algorithm complementarity are provided. Conclusively, the algorithms mostly agree on highly expressed circRNAs, however, in many cases, algorithm-specific false positives with high read counts are predicted, which is resolved by using the shared output from two (or more) algorithms.
非编码RNA是一类有趣的基因调控因子,具有多种功能。非编码RNA的一个大亚组是最近发现的环状RNA(circRNA)类。CircRNA以组织和发育特异性的方式保守且表达,尽管对于绝大多数circRNA而言,其功能相关性仍不清楚。为了鉴定和量化circRNA的表达,已经开发了几种生物信息学流程来评估任何给定的总RNA测序数据集中的circRNA目录。我们最近比较了五种不同的circRNA检测算法,但在此分析中扩展到了11种算法。通过比较发现的circRNA数量及其对RNaseR消化的各自敏感性,评估每种算法的敏感性和特异性。此外,还确定了预测circRNA的能力,即并非来自注释剪接位点的circRNA,以及在circRNA预测之前消除低质量和含接头读数的影响。最后,也是最重要的,测试了算法的所有可能成对组合,并提供了算法互补性的指导原则。总之,这些算法在高表达的circRNA上大多一致,然而,在许多情况下,会预测到具有高读数计数的算法特异性假阳性,这可以通过使用两种(或更多)算法的共享输出得到解决。