University of East Anglia; School of Computing Sciences; Norwich, UK.
RNA Biol. 2013 Jul;10(7):1221-30. doi: 10.4161/rna.25538. Epub 2013 Jun 28.
Small RNAs (sRNAs) are 20-25 nt non-coding RNAs that act as guides for the highly sequence-specific regulatory mechanism known as RNA silencing. Due to the recent increase in sequencing depth, a highly complex and diverse population of sRNAs in both plants and animals has been revealed. However, the exponential increase in sequencing data has also made the identification of individual sRNA transcripts corresponding to biological units (sRNA loci) more challenging when based exclusively on the genomic location of the constituent sRNAs, hindering existing approaches to identify sRNA loci. To infer the location of significant biological units, we propose an approach for sRNA loci detection called CoLIde (Co-expression based sRNA Loci Identification) that combines genomic location with the analysis of other information such as variation in expression levels (expression pattern) and size class distribution. For CoLIde, we define a locus as a union of regions sharing the same pattern and located in close proximity on the genome. Biological relevance, detected through the analysis of size class distribution, is also calculated for each locus. CoLIde can be applied on ordered (e.g., time-dependent) or un-ordered (e.g., organ, mutant) series of samples both with or without biological/technical replicates. The method reliably identifies known types of loci and shows improved performance on sequencing data from both plants (e.g., A. thaliana, S. lycopersicum) and animals (e.g., D. melanogaster) when compared with existing locus detection techniques. CoLIde is available for use within the UEA Small RNA Workbench which can be downloaded from: http://srna-workbench.cmp.uea.ac.uk.
小 RNA(sRNA)是 20-25 个核苷酸的非编码 RNA,作为高度序列特异性的调控机制——RNA 沉默的向导。由于测序深度的最近增加,在植物和动物中都揭示了高度复杂和多样化的 sRNA 群体。然而,测序数据的指数级增长也使得仅基于组成 sRNA 的基因组位置来识别对应于生物单位(sRNA 基因座)的个别 sRNA 转录本变得更加具有挑战性,从而阻碍了现有的识别 sRNA 基因座的方法。为了推断重要生物单位的位置,我们提出了一种称为 CoLIde(基于共表达的 sRNA 基因座识别)的 sRNA 基因座检测方法,该方法将基因组位置与其他信息(如表达水平的变化(表达模式)和大小类分布)的分析相结合。对于 CoLIde,我们将基因座定义为共享相同模式且位于基因组上紧密接近的区域的联合体。还通过大小类分布的分析计算每个基因座的生物学相关性。CoLIde 可以应用于有序(例如,随时间变化)或无序(例如,器官,突变体)的样本系列,无论是否具有生物学/技术重复。与现有的基因座检测技术相比,该方法可靠地识别已知类型的基因座,并在植物(例如,拟南芥,番茄)和动物(例如,黑腹果蝇)的测序数据上表现出更好的性能。CoLIde 可在 UEA Small RNA Workbench 内使用,可从以下网址下载:http://srna-workbench.cmp.uea.ac.uk。