BMC Bioinformatics. 2014;15 Suppl 1(Suppl 1):S4. doi: 10.1186/1471-2105-15-S1-S4. Epub 2014 Jan 10.
MicroRNAs (miRNAs) are small non-coding RNAs which play a key role in the post-transcriptional regulation of many genes. Elucidating miRNA-regulated gene networks is crucial for the understanding of mechanisms and functions of miRNAs in many biological processes, such as cell proliferation, development, differentiation and cell homeostasis, as well as in many types of human tumors. To this aim, we have recently presented the biclustering method HOCCLUS2, for the discovery of miRNA regulatory networks. Experiments on predicted interactions revealed that the statistical and biological consistency of the obtained networks is negatively affected by the poor reliability of the output of miRNA target prediction algorithms. Recently, some learning approaches have been proposed to learn to combine the outputs of distinct prediction algorithms and improve their accuracy. However, the application of classical supervised learning algorithms presents two challenges: i) the presence of only positive examples in datasets of experimentally verified interactions and ii) unbalanced number of labeled and unlabeled examples.
We present a learning algorithm that learns to combine the score returned by several prediction algorithms, by exploiting information conveyed by (only positively labeled/) validated and unlabeled examples of interactions. To face the two related challenges, we resort to a semi-supervised ensemble learning setting. Results obtained using miRTarBase as the set of labeled (positive) interactions and mirDIP as the set of unlabeled interactions show a significant improvement, over competitive approaches, in the quality of the predictions. This solution also improves the effectiveness of HOCCLUS2 in discovering biologically realistic miRNA:mRNA regulatory networks from large-scale prediction data. Using the miR-17-92 gene cluster family as a reference system and comparing results with previous experiments, we find a large increase in the number of significantly enriched biclusters in pathways, consistent with miR-17-92 functions.
The proposed approach proves to be fundamental for the computational discovery of miRNA regulatory networks from large-scale predictions. This paves the way to the systematic application of HOCCLUS2 for a comprehensive reconstruction of all the possible multiple interactions established by miRNAs in regulating the expression of gene networks, which would be otherwise impossible to reconstruct by considering only experimentally validated interactions.
微小 RNA(miRNA)是在许多基因的转录后调控中发挥关键作用的小非编码 RNA。阐明 miRNA 调控的基因网络对于理解 miRNA 在许多生物学过程(如细胞增殖、发育、分化和细胞内稳态)以及许多类型的人类肿瘤中的机制和功能至关重要。为此,我们最近提出了 biclustering 方法 HOCCLUS2,用于发现 miRNA 调控网络。对预测相互作用的实验表明,获得的网络的统计和生物学一致性受到 miRNA 靶标预测算法输出可靠性差的负面影响。最近,已经提出了一些学习方法来学习组合不同预测算法的输出并提高它们的准确性。然而,经典监督学习算法的应用存在两个挑战:i)实验验证的相互作用数据集中仅存在阳性示例,ii)标记和未标记示例的数量不平衡。
我们提出了一种学习算法,通过利用相互作用的验证/未标记示例所传递的信息,来学习组合由几个预测算法返回的分数。为了应对这两个相关的挑战,我们采用了半监督集成学习设置。使用 miRTarBase 作为标记(阳性)相互作用集和 mirDIP 作为未标记相互作用集获得的结果表明,在预测质量方面,与竞争方法相比,有显著提高。该解决方案还提高了 HOCCLUS2 从大规模预测数据中发现具有生物学意义的 miRNA:mRNA 调控网络的有效性。使用 miR-17-92 基因簇家族作为参考系统,并将结果与以前的实验进行比较,我们发现通路中显著富集的 biclusters 的数量大大增加,这与 miR-17-92 的功能一致。
所提出的方法对于从大规模预测中计算发现 miRNA 调控网络是至关重要的。这为 HOCCLUS2 的系统应用铺平了道路,以便全面重建 miRNA 调节基因网络的所有可能的多重相互作用,否则仅考虑实验验证的相互作用是不可能重建的。