State Key Laboratory of Bioelectronics, School of Biological Sciences and Medical Engineering, Southeast University, Nanjing, Jiangsu, China.
Research Center for Learning Sciences, Southeast University, Nanjing, Jiangsu, China.
Bioinformatics. 2017 Jul 15;33(14):2131-2139. doi: 10.1093/bioinformatics/btx129.
Circular RNAs (circRNAs) are a class of non-coding RNAs that are widely expressed in various cell lines and tissues of many organisms. Although the exact function of many circRNAs is largely unknown, the cell type-and tissue-specific circRNA expression has implicated their crucial functions in many biological processes. Hence, the quantification of circRNA expression from high-throughput RNA-seq data is becoming important to ascertain. Although many model-based methods have been developed to quantify linear RNA expression from RNA-seq data, these methods are not applicable to circRNA quantification.
Here, we proposed a novel strategy that transforms circular transcripts to pseudo-linear transcripts and estimates the expression values of both circular and linear transcripts using an existing model-based algorithm, Sailfish. The new strategy can accurately estimate transcript expression of both linear and circular transcripts from RNA-seq data. Several factors, such as gene length, amount of expression and the ratio of circular to linear transcripts, had impacts on quantification performance of circular transcripts. In comparison to count-based tools, the new computational framework had superior performance in estimating the amount of circRNA expression from both simulated and real ribosomal RNA-depleted (rRNA-depleted) RNA-seq datasets. On the other hand, the consideration of circular transcripts in expression quantification from rRNA-depleted RNA-seq data showed substantial increased accuracy of linear transcript expression. Our proposed strategy was implemented in a program named Sailfish-cir.
Sailfish-cir is freely available at https://github.com/zerodel/Sailfish-cir .
tongz@medicine.nevada.edu or wanjun.gu@gmail.com.
Supplementary data are available at Bioinformatics online.
环状 RNA(circRNAs)是一类广泛存在于多种生物的各种细胞系和组织中的非编码 RNA。虽然许多 circRNA 的确切功能还知之甚少,但细胞类型和组织特异性的 circRNA 表达表明它们在许多生物过程中具有关键作用。因此,从高通量 RNA-seq 数据中定量 circRNA 表达变得越来越重要。虽然已经开发了许多基于模型的方法来从 RNA-seq 数据中定量线性 RNA 表达,但这些方法不适用于 circRNA 定量。
在这里,我们提出了一种新策略,该策略将环状转录物转化为伪线性转录物,并使用现有的基于模型的算法 Sailfish 估计环状和线性转录物的表达值。该新策略可以从 RNA-seq 数据中准确估计线性和环状转录物的表达。基因长度、表达量和环状与线性转录物的比例等几个因素对环状转录物定量性能有影响。与基于计数的工具相比,该新的计算框架在估计模拟和真实核糖体 RNA 耗尽(rRNA 耗尽)RNA-seq 数据集的 circRNA 表达量方面具有更好的性能。另一方面,在 rRNA 耗尽 RNA-seq 数据中考虑环状转录物进行表达定量,可显著提高线性转录物表达的准确性。我们提出的策略在名为 Sailfish-cir 的程序中实现。
Sailfish-cir 可在 https://github.com/zerodel/Sailfish-cir 上免费获得。
tongz@medicine.nevada.edu 或 wanjun.gu@gmail.com。
补充数据可在生物信息学在线获得。