Department of Urology, Erasmus University Medical Center, Be 362a, PO Box 2040, 3000 CA Rotterdam, The Netherlands.
Bioinformatics. 2015 Mar 1;31(5):665-73. doi: 10.1093/bioinformatics/btu696. Epub 2014 Oct 22.
Recent discoveries show that most types of small non-coding RNAs (sncRNAs) such as miRNAs, snoRNAs and tRNAs get further processed into putatively active smaller RNA species. Their roles, genetic profiles and underlying processing mechanisms are only partially understood. To find their quantities and characteristics, a proper annotation is essential. Here, we present FlaiMapper, a method that extracts and annotates the locations of sncRNA-derived RNAs (sncdRNAs). These sncdRNAs are often detected in sequencing data and observed as fragments of their precursor sncRNA. Using small RNA-seq read alignments, FlaiMapper is able to annotate fragments primarily by peak detection on the start and end position densities followed by filtering and a reconstruction process.
To assess performance of FlaiMapper, we used independent publicly available small RNA-seq data. We were able to detect fragments representing putative sncdRNAs from nearly all types of sncRNA, including 97.8% of the annotated miRNAs in miRBase that have supporting reads. Comparison of FlaiMapper-predicted boundaries of miRNAs with miRBase entries demonstrated that 89% of the start and 54% of the end positions are identical. Additional benchmarking showed that FlaiMapper is superior in performance compared with existing software. Further analysis indicated a variety of characteristics in the fragments, including sequence motifs and relations with RNA interacting factors. These characteristics set a good basis for further research on sncdRNAs.
The platform independent GPL licensed Python 2.7 code is available at: https://github.com/yhoogstrate/flaimapper.
最近的发现表明,大多数类型的小非编码 RNA(sncRNA),如 miRNA、snoRNA 和 tRNA,进一步加工成潜在活性较小的 RNA 种类。它们的作用、遗传特征和潜在的加工机制尚未完全了解。为了确定它们的数量和特征,适当的注释是必不可少的。在这里,我们提出了 FlaiMapper,一种提取和注释 sncRNA 衍生 RNA(sncdRNA)位置的方法。这些 sncdRNAs 通常在测序数据中被检测到,并被观察为其前体 sncRNA 的片段。FlaiMapper 使用小 RNA-seq 读段比对,通过在起始和结束位置密度上进行峰检测,然后进行过滤和重建过程,主要注释片段。
为了评估 FlaiMapper 的性能,我们使用了独立的公共可用的小 RNA-seq 数据。我们能够检测到代表几乎所有类型 sncRNA 的潜在 sncdRNA 的片段,包括 miRBase 中具有支持读段的注释 miRNA 的 97.8%。FlaiMapper 预测的 miRNA 边界与 miRBase 条目的比较表明,89%的起始位置和 54%的结束位置是相同的。进一步的基准测试表明,FlaiMapper 的性能优于现有的软件。进一步的分析表明,片段具有多种特征,包括序列基序和与 RNA 相互作用因子的关系。这些特征为 sncdRNA 的进一步研究奠定了良好的基础。
这个独立于平台的 GPL 许可的 Python 2.7 代码可以在以下网址获得:https://github.com/yhoogstrate/flaimapper。