University of East Anglia, Norwich, United Kingdom.
J Exp Zool B Mol Dev Evol. 2013 Jan;320(1):47-56. doi: 10.1002/jez.b.22483. Epub 2012 Nov 26.
MicroRNAs (miRNAs) are a class of small non-coding RNA (sRNA) involved in gene regulation through mRNA decay and translational repression. In animals, miRNAs have crucial regulatory functions during embryonic development and they have also been implicated in several diseases such as cancer, cardiovascular and neurodegenerative disorders. As such, it is of importance to successfully characterize new miRNAs in order to further study their function. Recent advances in sequencing technologies have made it possible to capture a high-resolution snapshot of the complete sRNA content of an organism or tissue. A common approach to miRNA detection involves searching such data for telltale miRNA signatures. However, current miRNA prediction tools usually require a sequenced genome to analyse regions flanking aligned sRNA reads in order to identify characteristic miRNA hairpin secondary structures. Since only a handful of published genomes are available, there is a need for novel methods to identify miRNAs in sRNA datasets from high-throughput sequencing devices without requiring a reference genome. This paper presents miRPlex, a tool for miRNA prediction that requires only sRNA datasets as input. Mature miRNAs are predicted from such datasets through a multi-stage process, involving filtering, miRNA:miRNA* duplex generation and duplex classification using a support vector machine. Tests on sRNA datasets from model animals demonstrate that the tool is effective at predicting genuine miRNA duplexes, and, for some sets, achieves a high degree of precision when considering only the mature sequence.
微小 RNA(miRNA)是一类小的非编码 RNA(sRNA),通过 mRNA 降解和翻译抑制参与基因调控。在动物中,miRNA 在胚胎发育过程中具有重要的调节功能,它们也与癌症、心血管和神经退行性疾病等多种疾病有关。因此,成功鉴定新的 miRNA 对于进一步研究其功能非常重要。测序技术的最新进展使得捕获生物体或组织中完整 sRNA 含量的高分辨率快照成为可能。一种常见的 miRNA 检测方法是在这些数据中搜索特征 miRNA 特征。然而,目前的 miRNA 预测工具通常需要一个测序的基因组来分析对齐 sRNA 读数侧翼的区域,以识别特征 miRNA 发夹二级结构。由于只有少数已发表的基因组可用,因此需要新的方法来在没有参考基因组的情况下从高通量测序设备的 sRNA 数据集中识别 miRNA。本文提出了 miRPlex,这是一种 miRNA 预测工具,仅需要 sRNA 数据集作为输入。通过一个涉及过滤、miRNA:miRNA*双链体生成和使用支持向量机进行双链体分类的多阶段过程,从这些数据集预测成熟 miRNA。对来自模型动物的 sRNA 数据集的测试表明,该工具有效地预测了真正的 miRNA 双链体,并且在仅考虑成熟序列的情况下,对于某些数据集,其精度达到了很高的水平。