Johnson Nathan R, Yeoh Jonathan M, Coruh Ceyda, Axtell Michael J
Huck Institutes of the Life Sciences, Penn State University, Philadelphia 16802 Department of Biology, Knox College, Galesburg, Illinois, 61401.
Department of Biology, Knox College, Galesburg, Illinois, 61401.
G3 (Bethesda). 2016 Jul 7;6(7):2103-11. doi: 10.1534/g3.116.030452.
High-throughput sequencing of small RNAs (sRNA-seq) is a popular method used to discover and annotate microRNAs (miRNAs), endogenous short interfering RNAs (siRNAs), and Piwi-associated RNAs (piRNAs). One of the key steps in sRNA-seq data analysis is alignment to a reference genome. sRNA-seq libraries often have a high proportion of reads that align to multiple genomic locations, which makes determining their true origins difficult. Commonly used sRNA-seq alignment methods result in either very low precision (choosing an alignment at random), or sensitivity (ignoring multi-mapping reads). Here, we describe and test an sRNA-seq alignment strategy that uses local genomic context to guide decisions on proper placements of multi-mapped sRNA-seq reads. Tests using simulated sRNA-seq data demonstrated that this local-weighting method outperforms other alignment strategies using three different plant genomes. Experimental analyses with real sRNA-seq data also indicate superior performance of local-weighting methods for both plant miRNAs and heterochromatic siRNAs. The local-weighting methods we have developed are implemented as part of the sRNA-seq analysis program ShortStack, which is freely available under a general public license. Improved genome alignments of sRNA-seq data should increase the quality of downstream analyses and genome annotation efforts.
小RNA高通量测序(sRNA-seq)是一种用于发现和注释微小RNA(miRNA)、内源性小干扰RNA(siRNA)和Piwi相互作用RNA(piRNA)的常用方法。sRNA-seq数据分析的关键步骤之一是与参考基因组进行比对。sRNA-seq文库中通常有很大比例的 reads 会比对到多个基因组位置,这使得确定它们的真实来源变得困难。常用的sRNA-seq比对方法要么导致非常低的精度(随机选择比对),要么导致灵敏度低(忽略多重比对的reads)。在这里,我们描述并测试了一种sRNA-seq比对策略,该策略利用局部基因组背景来指导对多重比对的sRNA-seq reads进行正确定位的决策。使用模拟sRNA-seq数据进行的测试表明,这种局部加权方法在使用三种不同植物基因组时优于其他比对策略。对真实sRNA-seq数据的实验分析也表明,局部加权方法对植物miRNA和异染色质siRNA均具有卓越的性能。我们开发的局部加权方法作为sRNA-seq分析程序ShortStack的一部分得以实现,该程序在通用公共许可下可免费获取。sRNA-seq数据的改进基因组比对应能提高下游分析和基因组注释工作的质量。