SHARAKU：一种用于非编码RNA加工中深度测序读段映射图谱比对和聚类的算法。

SHARAKU: an algorithm for aligning and clustering read mapping profiles of deep sequencing in non-coding RNA processing.

作者信息

Tsuchiya Mariko, Amano Kojiro, Abe Masaya, Seki Misato, Hase Sumitaka, Sato Kengo, Sakakibara Yasubumi

机构信息

Department of Biosciences and Informatics, Keio University, Yokohama 161-0031, Japan.

出版信息

Bioinformatics. 2016 Jun 15;32(12):i369-i377. doi: 10.1093/bioinformatics/btw273.

DOI:10.1093/bioinformatics/btw273

PMID:27307639

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4908357/

Abstract

MOTIVATION

Deep sequencing of the transcripts of regulatory non-coding RNA generates footprints of post-transcriptional processes. After obtaining sequence reads, the short reads are mapped to a reference genome, and specific mapping patterns can be detected called read mapping profiles, which are distinct from random non-functional degradation patterns. These patterns reflect the maturation processes that lead to the production of shorter RNA sequences. Recent next-generation sequencing studies have revealed not only the typical maturation process of miRNAs but also the various processing mechanisms of small RNAs derived from tRNAs and snoRNAs.

RESULTS

We developed an algorithm termed SHARAKU to align two read mapping profiles of next-generation sequencing outputs for non-coding RNAs. In contrast with previous work, SHARAKU incorporates the primary and secondary sequence structures into an alignment of read mapping profiles to allow for the detection of common processing patterns. Using a benchmark simulated dataset, SHARAKU exhibited superior performance to previous methods for correctly clustering the read mapping profiles with respect to 5'-end processing and 3'-end processing from degradation patterns and in detecting similar processing patterns in deriving the shorter RNAs. Further, using experimental data of small RNA sequencing for the common marmoset brain, SHARAKU succeeded in identifying the significant clusters of read mapping profiles for similar processing patterns of small derived RNA families expressed in the brain.

AVAILABILITY AND IMPLEMENTATION

The source code of our program SHARAKU is available at http://www.dna.bio.keio.ac.jp/sharaku/, and the simulated dataset used in this work is available at the same link. Accession code: The sequence data from the whole RNA transcripts in the hippocampus of the left brain used in this work is available from the DNA DataBank of Japan (DDBJ) Sequence Read Archive (DRA) under the accession number DRA004502.

CONTACT

yasu@bio.keio.ac.jp

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

对调控性非编码RNA的转录本进行深度测序可生成转录后过程的印记。获得序列读数后，将短读数映射到参考基因组上，并可检测到特定的映射模式，称为读数映射谱，它与随机的无功能降解模式不同。这些模式反映了导致产生较短RNA序列的成熟过程。最近的下一代测序研究不仅揭示了miRNA的典型成熟过程，还揭示了源自tRNA和snoRNA的小RNA的各种加工机制。

结果

我们开发了一种名为SHARAKU的算法，用于比对下一代测序输出的非编码RNA的两个读数映射谱。与之前的工作相比，SHARAKU将一级和二级序列结构纳入读数映射谱的比对中，以便检测常见的加工模式。使用基准模拟数据集，在从降解模式中正确聚类5'端加工和3'端加工的读数映射谱以及在检测衍生较短RNA时的相似加工模式方面，SHARAKU表现出优于先前方法的性能。此外，使用普通狨猴大脑的小RNA测序实验数据，SHARAKU成功识别出大脑中表达的小衍生RNA家族相似加工模式的读数映射谱的显著聚类。