Department of Community Information Systems, Zefat Academic College, Zefat, Israel.
Department of Information Systems, The Max Stern Yezreel Valley Academic College, Yezreel, Israel.
Bioinformatics. 2019 Oct 15;35(20):4020-4028. doi: 10.1093/bioinformatics/btz204.
Disease is often manifested via changes in transcript and protein abundance. MicroRNAs (miRNAs) are instrumental in regulating protein abundance and may measurably influence transcript levels. miRNAs often target more than one mRNA (for humans, the average is three), and mRNAs are often targeted by more than one miRNA (for the genes considered in this study, the average is also three). Therefore, it is difficult to determine the miRNAs that may cause the observed differential gene expression. We present a novel approach, maTE, which is based on machine learning, that integrates information about miRNA target genes with gene expression data. maTE depends on the availability of a sufficient amount of patient and control samples. The samples are used to train classifiers to accurately classify the samples on a per miRNA basis. Multiple high scoring miRNAs are used to build a final classifier to improve separation.
The aim of the study is to find a set of miRNAs causing the regulation of their target genes that best explains the difference between groups (e.g. cancer versus control). maTE provides a list of significant groups of genes where each group is targeted by a specific miRNA. For the datasets used in this study, maTE generally achieves an accuracy well above 80%. Also, the results show that when the accuracy is much lower (e.g. ∼50%), the set of miRNAs provided is likely not causative of the difference in expression. This new approach of integrating miRNA regulation with expression data yields powerful results and is independent of external labels and training data. Thereby, this approach allows new avenues for exploring miRNA regulation and may enable the development of miRNA-based biomarkers and drugs.
The KNIME workflow, implementing maTE, is available at Bioinformatics online.
Supplementary data are available at Bioinformatics online.
疾病通常表现为转录物和蛋白质丰度的变化。MicroRNAs(miRNAs)在调节蛋白质丰度方面起着重要作用,并且可能对转录物水平产生可测量的影响。miRNAs 通常靶向多个 mRNA(对于人类,平均为三个),并且 mRNA 通常被多个 miRNA 靶向(对于本研究中考虑的基因,平均也是三个)。因此,很难确定可能导致观察到的差异基因表达的 miRNAs。我们提出了一种新方法 maTE,它基于机器学习,整合了 miRNA 靶基因的信息与基因表达数据。maTE 依赖于有足够数量的患者和对照样本。这些样本用于训练分类器,以准确地按 miRNA 为基础对样本进行分类。使用多个高得分 miRNA 构建最终分类器以提高分离度。
该研究的目的是找到一组 miRNA,它们调节其靶基因,从而最好地解释组间(例如癌症与对照)的差异。maTE 提供了一组显著的靶基因,每个基因都由特定的 miRNA 靶向。对于本研究中使用的数据集,maTE 通常达到了远高于 80%的准确性。此外,结果表明,当准确性低得多(例如约 50%)时,提供的 miRNA 集不太可能是表达差异的原因。这种将 miRNA 调控与表达数据整合的新方法产生了强大的结果,并且独立于外部标签和训练数据。因此,这种方法为探索 miRNA 调控开辟了新途径,并可能为 miRNA 为基础的生物标志物和药物的开发提供了可能。
KNIME 工作流程,实现 maTE,可在 Bioinformatics 在线获得。
补充数据可在 Bioinformatics 在线获得。