Université de Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Evolutive UMR 5558, F-69622 Villeurbanne, France.
Inria Lyon Centre, ERABLE team, 56 Bd Niels Bohr, 69100 Villeurbanne, France.
Gigascience. 2022 Oct 25;11. doi: 10.1093/gigascience/giac093.
MicroRNAs (miRNAs) are small noncoding RNAs that are key players in the regulation of gene expression. In the past decade, with the increasing accessibility of high-throughput sequencing technologies, different methods have been developed to identify miRNAs, most of which rely on preexisting reference genomes. However, when a reference genome is absent or is not of high quality, such identification becomes more difficult. In this context, we developed BrumiR, an algorithm that is able to discover miRNAs directly and exclusively from small RNA (sRNA) sequencing (sRNA-seq) data. We benchmarked BrumiR with datasets encompassing animal and plant species using real and simulated sRNA-seq experiments. The results demonstrate that BrumiR reaches the highest recall for miRNA discovery, while at the same time being much faster and more efficient than the state-of-the-art tools evaluated. The latter allows BrumiR to analyze a large number of sRNA-seq experiments, from plants or animal species. Moreover, BrumiR detects additional information regarding other expressed sequences (sRNAs, isomiRs, etc.), thus maximizing the biological insight gained from sRNA-seq experiments. Additionally, when a reference genome is available, BrumiR provides a new mapping tool (BrumiR2reference) that performs an a posteriori exhaustive search to identify the precursor sequences. Finally, we also provide a machine learning classifier based on a random forest model that evaluates the sequence-derived features to further refine the prediction obtained from the BrumiR-core. The code of BrumiR and all the algorithms that compose the BrumiR toolkit are freely available at https://github.com/camoragaq/BrumiR.
微小 RNA(miRNAs)是一种小型非编码 RNA,是基因表达调控的关键参与者。在过去的十年中,随着高通量测序技术的不断普及,已经开发出了许多不同的方法来鉴定 miRNAs,其中大多数方法都依赖于现有的参考基因组。然而,当参考基因组缺失或质量不高时,这种鉴定就变得更加困难。在这种情况下,我们开发了 BrumiR 算法,它能够直接从小 RNA(sRNA)测序(sRNA-seq)数据中专门发现 miRNAs。我们使用真实和模拟的 sRNA-seq 实验,用涵盖动物和植物物种的数据集对 BrumiR 进行了基准测试。结果表明,BrumiR 在 miRNA 发现方面具有最高的召回率,同时比评估的最先进工具快得多,效率也高得多。后者允许 BrumiR 分析大量来自植物或动物物种的 sRNA-seq 实验。此外,BrumiR 还可以检测其他表达序列(sRNAs、isomiRs 等)的附加信息,从而从 sRNA-seq 实验中获得最大的生物学见解。此外,当有参考基因组时,BrumiR 提供了一种新的映射工具(BrumiR2reference),它可以进行后验穷举搜索以识别前体序列。最后,我们还提供了一个基于随机森林模型的机器学习分类器,该分类器评估基于序列的特征,以进一步细化从 BrumiR-core 获得的预测。BrumiR 的代码和组成 BrumiR 工具包的所有算法都可以在 https://github.com/camoragaq/BrumiR 上免费获得。