Department of Computer Engineering, Faculty of Electronics, Wroclaw University of Science and Technology, Wroclaw, Poland.
Department of Biomedical Engineering, Faculty of Fundamental Problems of Technology, Wroclaw University of Science and Technology, Wroclaw, Poland.
Sci Rep. 2018 May 15;8(1):7560. doi: 10.1038/s41598-018-25578-3.
Mirtrons are non-canonical microRNAs encoded in introns the biogenesis of which starts with splicing. They are not processed by Drosha and enter the canonical pathway at the Exportin-5 level. Mirtrons are much less evolutionary conserved than canonical miRNAs. Due to the differences, canonical miRNA predictors are not applicable to mirtron prediction. Identification of differences is important for designing mirtron prediction algorithms and may help to improve the understanding of mirtron functioning. So far, only simple, single-feature comparisons were reported. These are insensitive to complex feature relations. We quantified miRNAs with 25 features and showed that it is impossible to distinguish the two miRNA species using simple thresholds on any single feature. However, when using the Principal Component Analysis mirtrons and canonical miRNAs are grouped separately. Moreover, several methodologically diverse machine learning classifiers delivered high classification performance. Using feature selection algorithms we found features (e.g. bulges in the stem region), previously reported divergent in two classes, that did not contribute to improving classification accuracy, which suggests that they are not biologically meaningful. Finally, we proposed a combination of the most important features (including Guanine content, hairpin free energy and hairpin length) which convey a specific pattern, crucial for identifying mirtrons.
微卫星是非典型的 microRNA,编码于内含子中,其生物发生始于剪接。它们不被 Drosha 处理,并在 Exportin-5 水平进入典型途径。微卫星比典型的 miRNA 进化保守性低得多。由于这些差异,典型的 miRNA 预测器不适用于 mirtron 预测。鉴定这些差异对于设计 mirtron 预测算法很重要,并可能有助于更好地理解 mirtron 的功能。到目前为止,仅报道了简单的单特征比较。这些比较对复杂的特征关系不敏感。我们使用 25 个特征对 miRNAs 进行了量化,并表明,使用任何单个特征上的简单阈值,都不可能区分这两种 miRNA 物种。然而,当使用主成分分析时,mirtrons 和典型的 miRNA 被分别分组。此外,几种方法学上不同的机器学习分类器都实现了很高的分类性能。通过使用特征选择算法,我们发现了一些特征(例如茎区的凸起),这些特征在前两类中被报道存在差异,但对提高分类准确性没有贡献,这表明它们没有生物学意义。最后,我们提出了一组最重要的特征(包括鸟嘌呤含量、发夹自由能和发夹长度)的组合,这些特征传达了一种特定的模式,对于识别 mirtrons 至关重要。