Nguyen Dat Thanh, Trac Quang Thinh, Nguyen Thi-Hau, Nguyen Ha-Nam, Ohad Nir, Pawitan Yudi, Vu Trung Nghia
Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden.
University of Engineering and Technology, Vietnam National University in Hanoi, Hanoi, Vietnam.
BMC Bioinformatics. 2021 Oct 13;22(1):495. doi: 10.1186/s12859-021-04418-8.
Circular RNA (circRNA) is an emerging class of RNA molecules attracting researchers due to its potential for serving as markers for diagnosis, prognosis, or therapeutic targets of cancer, cardiovascular, and autoimmune diseases. Current methods for detection of circRNA from RNA sequencing (RNA-seq) focus mostly on improving mapping quality of reads supporting the back-splicing junction (BSJ) of a circRNA to eliminate false positives (FPs). We show that mapping information alone often cannot predict if a BSJ-supporting read is derived from a true circRNA or not, thus increasing the rate of FP circRNAs.
We have developed Circall, a novel circRNA detection method from RNA-seq. Circall controls the FPs using a robust multidimensional local false discovery rate method based on the length and expression of circRNAs. It is computationally highly efficient by using a quasi-mapping algorithm for fast and accurate RNA read alignments. We applied Circall on two simulated datasets and three experimental datasets of human cell-lines. The results show that Circall achieves high sensitivity and precision in the simulated data. In the experimental datasets it performs well against current leading methods. Circall is also substantially faster than the other methods, particularly for large datasets.
With those better performances in the detection of circRNAs and in computational time, Circall facilitates the analyses of circRNAs in large numbers of samples. Circall is implemented in C++ and R, and available for use at https://www.meb.ki.se/sites/biostatwiki/circall and https://github.com/datngu/Circall.
环状RNA(circRNA)是一类新兴的RNA分子,因其有可能作为癌症、心血管疾病和自身免疫性疾病的诊断、预后或治疗靶点标志物而吸引了研究人员。目前从RNA测序(RNA-seq)中检测circRNA的方法主要集中在提高支持circRNA反向剪接连接(BSJ)的 reads 的映射质量,以消除假阳性(FP)。我们发现仅靠映射信息往往无法预测支持BSJ的 reads 是否来自真正的circRNA,从而增加了FP circRNA的发生率。
我们开发了Circall,一种从RNA-seq中检测circRNA的新方法。Circall使用基于circRNA长度和表达的强大多维局部错误发现率方法来控制FP。通过使用准映射算法进行快速准确的RNA reads比对,它在计算上非常高效。我们将Circall应用于两个人类细胞系的模拟数据集和三个实验数据集。结果表明,Circall在模拟数据中实现了高灵敏度和高精度。在实验数据集中,它与当前领先方法相比表现良好。Circall也比其他方法快得多,特别是对于大型数据集。
凭借在circRNA检测和计算时间方面的更好性能,Circall便于对大量样本中的circRNA进行分析。Circall用C++和R实现,可在https://www.meb.ki.se/sites/biostatwiki/circall和https://github.com/datngu/Circall上使用。