Tsuchiya Mariko, Amano Kojiro, Abe Masaya, Seki Misato, Hase Sumitaka, Sato Kengo, Sakakibara Yasubumi
Department of Biosciences and Informatics, Keio University, Yokohama 161-0031, Japan.
Bioinformatics. 2016 Jun 15;32(12):i369-i377. doi: 10.1093/bioinformatics/btw273.
Deep sequencing of the transcripts of regulatory non-coding RNA generates footprints of post-transcriptional processes. After obtaining sequence reads, the short reads are mapped to a reference genome, and specific mapping patterns can be detected called read mapping profiles, which are distinct from random non-functional degradation patterns. These patterns reflect the maturation processes that lead to the production of shorter RNA sequences. Recent next-generation sequencing studies have revealed not only the typical maturation process of miRNAs but also the various processing mechanisms of small RNAs derived from tRNAs and snoRNAs.
We developed an algorithm termed SHARAKU to align two read mapping profiles of next-generation sequencing outputs for non-coding RNAs. In contrast with previous work, SHARAKU incorporates the primary and secondary sequence structures into an alignment of read mapping profiles to allow for the detection of common processing patterns. Using a benchmark simulated dataset, SHARAKU exhibited superior performance to previous methods for correctly clustering the read mapping profiles with respect to 5'-end processing and 3'-end processing from degradation patterns and in detecting similar processing patterns in deriving the shorter RNAs. Further, using experimental data of small RNA sequencing for the common marmoset brain, SHARAKU succeeded in identifying the significant clusters of read mapping profiles for similar processing patterns of small derived RNA families expressed in the brain.
The source code of our program SHARAKU is available at http://www.dna.bio.keio.ac.jp/sharaku/, and the simulated dataset used in this work is available at the same link. Accession code: The sequence data from the whole RNA transcripts in the hippocampus of the left brain used in this work is available from the DNA DataBank of Japan (DDBJ) Sequence Read Archive (DRA) under the accession number DRA004502.
Supplementary data are available at Bioinformatics online.
对调控性非编码RNA的转录本进行深度测序可生成转录后过程的印记。获得序列读数后,将短读数映射到参考基因组上,并可检测到特定的映射模式,称为读数映射谱,它与随机的无功能降解模式不同。这些模式反映了导致产生较短RNA序列的成熟过程。最近的下一代测序研究不仅揭示了miRNA的典型成熟过程,还揭示了源自tRNA和snoRNA的小RNA的各种加工机制。
我们开发了一种名为SHARAKU的算法,用于比对下一代测序输出的非编码RNA的两个读数映射谱。与之前的工作相比,SHARAKU将一级和二级序列结构纳入读数映射谱的比对中,以便检测常见的加工模式。使用基准模拟数据集,在从降解模式中正确聚类5'端加工和3'端加工的读数映射谱以及在检测衍生较短RNA时的相似加工模式方面,SHARAKU表现出优于先前方法的性能。此外,使用普通狨猴大脑的小RNA测序实验数据,SHARAKU成功识别出大脑中表达的小衍生RNA家族相似加工模式的读数映射谱的显著聚类。
我们的程序SHARAKU的源代码可在http://www.dna.bio.keio.ac.jp/sharaku/获取,本工作中使用的模拟数据集也可在同一链接获取。登录号:本工作中使用的来自左脑海马体全RNA转录本的序列数据可从日本DNA数据库(DDBJ)序列读数存档(DRA)中获取,登录号为DRA004502。
补充数据可在《生物信息学》在线获取。