Cozzuto Luca, Liu Huanle, Pryszcz Leszek P, Pulido Toni Hermoso, Delgado-Tejedor Anna, Ponomarenko Julia, Novoa Eva Maria
Centre for Genomic Regulation, The Barcelona Institute of Science and Technology, Barcelona, Spain.
International Institute of Molecular and Cell Biology, Warsaw, Poland.
Front Genet. 2020 Mar 17;11:211. doi: 10.3389/fgene.2020.00211. eCollection 2020.
The direct RNA sequencing platform offered by Oxford Nanopore Technologies allows for direct measurement of RNA molecules without the need of conversion to complementary DNA, fragmentation or amplification. As such, it is virtually capable of detecting any given RNA modification present in the molecule that is being sequenced, as well as provide polyA tail length estimations at the level of individual RNA molecules. Although this technology has been publicly available since 2017, the complexity of the raw Nanopore data, together with the lack of systematic and reproducible pipelines, have greatly hindered the access of this technology to the general user. Here we address this problem by providing a fully benchmarked workflow for the analysis of direct RNA sequencing reads, termed . The pipeline starts with a pre-processing module, which converts raw current intensities into multiple types of processed data including FASTQ and BAM, providing metrics of the quality of the run, quality-filtering, demultiplexing, base-calling and mapping. In a second step, the pipeline performs downstream analyses of the mapped reads, including prediction of RNA modifications and estimation of polyA tail lengths. Four direct RNA MinION sequencing runs can be fully processed and analyzed in 10 h on 100 CPUs. The pipeline can also be executed in GPU locally or in the cloud, decreasing the run time fourfold. The software is written using the NextFlow framework for parallelization and portability, and relies on Linux containers such as Docker and Singularity for achieving better reproducibility. The workflow can be executed on any Unix-compatible OS on a computer, cluster or cloud without the need of installing any additional software or dependencies, and is freely available in Github (https://github.com/biocorecrg/master_of_pores). This workflow simplifies direct RNA sequencing data analyses, facilitating the study of the (epi)transcriptome at single molecule resolution.
牛津纳米孔技术公司提供的直接RNA测序平台能够直接测量RNA分子,无需将其转化为互补DNA、片段化或扩增。因此,它几乎能够检测正在测序的分子中存在的任何给定RNA修饰,并能在单个RNA分子水平上提供聚腺苷酸尾长度估计。尽管这项技术自2017年起就已公开可用,但原始纳米孔数据的复杂性,以及缺乏系统且可重复的流程,极大地阻碍了普通用户使用这项技术。在此,我们通过提供一个经过全面基准测试的工作流程来分析直接RNA测序读数,即 。该流程从一个预处理模块开始,它将原始电流强度转换为多种类型的处理后数据,包括FASTQ和BAM,提供运行质量、质量过滤、解复用、碱基识别和映射的指标。第二步,该流程对映射后的读数进行下游分析,包括RNA修饰预测和聚腺苷酸尾长度估计。在100个CPU上,四个直接RNA MinION测序运行可以在10小时内完全处理和分析完毕。该流程也可以在本地或云端的GPU上执行,将运行时间缩短四倍。该软件使用NextFlow框架编写以实现并行化和可移植性,并依赖于Docker和Singularity等Linux容器以实现更好的可重复性。 工作流程可以在计算机、集群或云端的任何Unix兼容操作系统上执行,无需安装任何额外软件或依赖项,并且可以在Github(https://github.com/biocorecrg/master_of_pores)上免费获取。这个工作流程简化了直接RNA测序数据分析,便于在单分子分辨率下研究(表观)转录组。