Université Paris-Sud, Institut de Génétique et Microbiologie, CNRS UMR 8621, Orsay F-91405, France.
Methods. 2013 Sep 1;63(1):60-5. doi: 10.1016/j.ymeth.2013.06.003. Epub 2013 Jun 25.
RNA-seq experiments are now routinely used for the large scale sequencing of transcripts. In bacteria or archaea, such deep sequencing experiments typically produce 10-50 million fragments that cover most of the genome, including intergenic regions. In this context, the precise delineation of the non-coding elements is challenging. Non-coding elements include untranslated regions (UTRs) of mRNAs, independent small RNA genes (sRNAs) and transcripts produced from the antisense strand of genes (asRNA). Here we present a computational pipeline (DETR'PROK: detection of ncRNAs in prokaryotes) based on the Galaxy framework that takes as input a mapping of deep sequencing reads and performs successive steps of clustering, comparison with existing annotation and identification of transcribed non-coding fragments classified into putative 5' UTRs, sRNAs and asRNAs. We provide a step-by-step description of the protocol using real-life example data sets from Vibrio splendidus and Escherichia coli.
RNA-seq 实验现在常用于大规模测序转录物。在细菌或古菌中,这种深度测序实验通常会产生 1000 万到 5000 万个覆盖大部分基因组的片段,包括基因间区。在这种情况下,精确划定非编码元件是具有挑战性的。非编码元件包括 mRNA 的非翻译区 (UTR)、独立的小 RNA 基因 (sRNA) 和从基因反义链转录的转录本 (asRNA)。在这里,我们提出了一个基于 Galaxy 框架的计算流程 (DETR'PROK: 在原核生物中检测 ncRNA),该流程将深度测序读数的映射作为输入,并执行连续的聚类、与现有注释的比较以及鉴定分类为推定 5'UTR、sRNA 和 asRNA 的转录非编码片段的步骤。我们使用来自灿烂弧菌和大肠杆菌的实际数据集提供了该协议的分步描述。