Department of Software and Information Systems Engineering, Ben-Gurion University of the Negev, Beer Sheva, Israel.
BMC Bioinformatics. 2021 May 24;22(1):264. doi: 10.1186/s12859-021-04164-x.
MicroRNAs (miRNAs) are small non-coding RNAs that regulate gene expression post-transcriptionally via base-pairing with complementary sequences on messenger RNAs (mRNAs). Due to the technical challenges involved in the application of high-throughput experimental methods, datasets of direct bona fide miRNA targets exist only for a few model organisms. Machine learning (ML)-based target prediction models were successfully trained and tested on some of these datasets. There is a need to further apply the trained models to organisms in which experimental training data are unavailable. However, it is largely unknown how the features of miRNA-target interactions evolve and whether some features have remained fixed during evolution, raising questions regarding the general, cross-species applicability of currently available ML methods.
We examined the evolution of miRNA-target interaction rules and used data science and ML approaches to investigate whether these rules are transferable between species. We analyzed eight datasets of direct miRNA-target interactions in four species (human, mouse, worm, cattle). Using ML classifiers, we achieved high accuracy for intra-dataset classification and found that the most influential features of all datasets overlap significantly. To explore the relationships between datasets, we measured the divergence of their miRNA seed sequences and evaluated the performance of cross-dataset classification. We found that both measures coincide with the evolutionary distance between the compared species.
The transferability of miRNA-targeting rules between species depends on several factors, the most associated factors being the composition of seed families and evolutionary distance. Furthermore, our feature-importance results suggest that some miRNA-target features have evolved while others remained fixed during the evolution of the species. Our findings lay the foundation for the future development of target prediction tools that could be applied to "non-model" organisms for which minimal experimental data are available.
The code is freely available at https://github.com/gbenor/TPVOD .
MicroRNAs (miRNAs) 是一种小的非编码 RNA,通过与信使 RNA (mRNA) 上互补序列的碱基配对,在转录后调控基因表达。由于高通量实验方法应用中的技术挑战,只有少数几种模式生物存在直接真实 miRNA 靶标的数据集。基于机器学习 (ML) 的靶标预测模型已成功地在这些数据集之一上进行了训练和测试。需要将这些训练好的模型进一步应用于缺乏实验训练数据的生物体中。然而,miRNA-靶标相互作用的特征如何进化以及某些特征在进化过程中是否保持不变,在很大程度上尚不清楚,这就引发了关于当前可用 ML 方法的普遍的、跨物种适用性的问题。
我们研究了 miRNA-靶标相互作用规则的进化,并使用数据科学和 ML 方法来研究这些规则是否可以在物种之间转移。我们分析了四个物种(人类、小鼠、线虫、牛)中的八个直接 miRNA-靶标相互作用数据集。使用 ML 分类器,我们在内部数据集分类中实现了高精度,并发现所有数据集的最有影响力的特征都有显著重叠。为了探索数据集之间的关系,我们测量了它们的 miRNA 种子序列的差异,并评估了跨数据集分类的性能。我们发现这两个度量都与比较物种之间的进化距离相吻合。
物种之间 miRNA 靶向规则的可转移性取决于几个因素,最相关的因素是种子家族的组成和进化距离。此外,我们的特征重要性结果表明,一些 miRNA-靶标特征在物种进化过程中发生了进化,而其他特征则保持不变。我们的发现为未来开发靶标预测工具奠定了基础,这些工具可以应用于实验数据最少的“非模式”生物体。
代码可在 https://github.com/gbenor/TPVOD 上免费获得。