多视图协同训练在 microRNA 预测中的应用。

Multi-view Co-training for microRNA Prediction.

机构信息

Department of Systems and Computer Engineering, Carleton University, Ottawa, Ontario, Canada.

出版信息

Sci Rep. 2019 Jul 29;9(1):10931. doi: 10.1038/s41598-019-47399-8.

DOI:10.1038/s41598-019-47399-8

PMID:31358877

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6662744/

Abstract

MicroRNA (miRNA) are short, non-coding RNAs involved in cell regulation at post-transcriptional and translational levels. Numerous computational predictors of miRNA been developed that generally classify miRNA based on either sequence- or expression-based features. While these methods are highly effective, they require large labelled training data sets, which are often not available for many species. Simultaneously, emerging high-throughput wet-lab experimental procedures are producing large unlabelled data sets of genomic sequence and RNA expression profiles. Existing methods use supervised machine learning and are therefore unable to leverage these unlabelled data. In this paper, we design and develop a multi-view co-training approach for the classification of miRNA to maximize the utility of unlabelled training data by taking advantage of multiple views of the problem. Starting with only 10 labelled training data, co-training is shown to significantly (p < 0.01) increase classification accuracy of both sequence- and expression-based classifiers, without requiring any new labelled training data. After 11 iterations of co-training, the expression-based view of miRNA classification experiences an average increase in AUPRC of 15.81% over six species, compared to 11.90% for self-training and 4.84% for passive learning. Similar results are observed for sequence-based classifiers with increases of 46.47%, 39.53% and 29.43%, for co-training, self-training, and passive learning, respectively. The final co-trained sequence and expression-based classifiers are integrated into a final confidence-based classifier which shows improved performance compared to both the expression (1.5%, p = 0.021) and sequence (3.7%, p = 0.006) views. This study represents the first application of multi-view co-training to miRNA prediction and shows great promise, particularly for understudied species with few available training data.

摘要

miRNA（microRNA）是参与转录后和翻译水平细胞调控的短非编码 RNA。已经开发出许多 miRNA 的计算预测器，这些预测器通常基于序列或表达特征对 miRNA 进行分类。虽然这些方法非常有效，但它们需要大量标记的训练数据集，而这些数据集通常不适用于许多物种。同时，新兴的高通量湿实验室实验程序正在产生大量未标记的基因组序列和 RNA 表达谱数据集。现有的方法使用监督机器学习，因此无法利用这些未标记的数据。在本文中，我们设计并开发了一种 miRNA 分类的多视图协同训练方法，通过利用问题的多个视图来最大限度地利用未标记的训练数据。仅使用 10 个标记的训练数据，协同训练显著（p<0.01）提高了基于序列和基于表达的分类器的分类准确性，而无需任何新的标记训练数据。经过 11 次协同训练迭代，与自我训练（11.90%）和被动学习（4.84%）相比，基于表达的 miRNA 分类的平均 AUPRC 增加了 15.81%，而基于序列的分类器的平均 AUPRC 增加了 46.47%、39.53%和 29.43%。对于序列分类器，协同训练、自我训练和被动学习的分别增加了 46.47%、39.53%和 29.43%。最后，将协同训练的序列和基于表达的分类器集成到一个最终的置信度分类器中，与基于表达（1.5%，p=0.021）和基于序列（3.7%，p=0.006）的分类器相比，该分类器的性能得到了提高。本研究代表了多视图协同训练在 miRNA 预测中的首次应用，具有很大的潜力，特别是对于可用训练数据较少的研究较少的物种。

相似文献

Multi-view Co-training for microRNA Prediction.多视图协同训练在 microRNA 预测中的应用。

Sci Rep. 2019 Jul 29;9(1):10931. doi: 10.1038/s41598-019-47399-8.

A semi-supervised machine learning framework for microRNA classification.一种用于 microRNA 分类的半监督机器学习框架。

Hum Genomics. 2019 Oct 22;13(Suppl 1):43. doi: 10.1186/s40246-019-0221-7.

Combining Supervised and Unsupervised Learning for Improved miRNA Target Prediction.结合监督学习和无监督学习提高 miRNA 靶标预测。

IEEE/ACM Trans Comput Biol Bioinform. 2018 Sep-Oct;15(5):1594-1604. doi: 10.1109/TCBB.2017.2727042. Epub 2017 Jul 13.

microPred: effective classification of pre-miRNAs for human miRNA gene prediction.microPred：用于人类miRNA基因预测的前体miRNA有效分类

Bioinformatics. 2009 Apr 15;25(8):989-95. doi: 10.1093/bioinformatics/btp107. Epub 2009 Feb 20.

The Limitations of Existing Approaches in Improving MicroRNA Target Prediction Accuracy.现有方法在提高微小RNA靶标预测准确性方面的局限性。

Methods Mol Biol. 2017;1617:133-158. doi: 10.1007/978-1-4939-7046-9_10.

Predicting novel microRNA: a comprehensive comparison of machine learning approaches.预测新的 microRNA：机器学习方法的全面比较。

Brief Bioinform. 2019 Sep 27;20(5):1607-1620. doi: 10.1093/bib/bby037.

miRClassify: an advanced web server for miRNA family classification and annotation.miRClassify：一个用于 miRNA 家族分类和注释的高级网络服务器。

Comput Biol Med. 2014 Feb;45:157-60. doi: 10.1016/j.compbiomed.2013.12.007. Epub 2013 Dec 21.

Co-Labeling for Multi-View Weakly Labeled Learning.多视图弱标签学习的联合标记。

IEEE Trans Pattern Anal Mach Intell. 2016 Jun;38(6):1113-25. doi: 10.1109/TPAMI.2015.2476813. Epub 2015 Sep 4.

Comprehensive machine-learning-based analysis of microRNA-target interactions reveals variable transferability of interaction rules across species.基于机器学习的 miRNA 靶标相互作用综合分析揭示了物种间相互作用规则的可变性。

BMC Bioinformatics. 2021 May 24;22(1):264. doi: 10.1186/s12859-021-04164-x.

MicroRNA transcription start site prediction with multi-objective feature selection.基于多目标特征选择的微小RNA转录起始位点预测

Stat Appl Genet Mol Biol. 2012 Jan 6;11(1):Article 6. doi: 10.2202/1544-6115.1743.

引用本文的文献

Enhancing severe hypoglycemia prediction in type 2 diabetes mellitus through multi-view co-training machine learning model for imbalanced dataset.通过多视图协同训练机器学习模型对 2 型糖尿病严重低血糖进行预测，解决数据集不平衡问题。

Sci Rep. 2024 Sep 30;14(1):22741. doi: 10.1038/s41598-024-69844-z.

Species-specific microRNA discovery and target prediction in the soybean cyst nematode.大豆胞囊线虫中物种特异性 microRNA 的发现和靶标预测。

Sci Rep. 2023 Oct 17;13(1):17657. doi: 10.1038/s41598-023-44469-w.

Prostatic fluid exosome-mediated microRNA-155 promotes the pathogenesis of type IIIA chronic prostatitis.前列腺液外泌体介导的微小RNA-155促进ⅢA型慢性前列腺炎的发病机制。

Transl Androl Urol. 2021 May;10(5):1976-1987. doi: 10.21037/tau-21-139.

CFSP: a collaborative frequent sequence pattern discovery algorithm for nucleic acid sequence classification.CFSP：一种用于核酸序列分类的协作频繁序列模式发现算法。

PeerJ. 2020 Apr 20;8:e8965. doi: 10.7717/peerj.8965. eCollection 2020.

A semi-supervised machine learning framework for microRNA classification.一种用于 microRNA 分类的半监督机器学习框架。

Hum Genomics. 2019 Oct 22;13(Suppl 1):43. doi: 10.1186/s40246-019-0221-7.

本文引用的文献

Tumor‑suppressive microRNA‑223 targets WDR62 directly in bladder cancer.抑瘤 microRNA-223 可直接靶向膀胱癌中的 WDR62。

Int J Oncol. 2019 Jun;54(6):2222-2236. doi: 10.3892/ijo.2019.4762. Epub 2019 Mar 22.

miPIE: NGS-based Prediction of miRNA Using Integrated Evidence.miPIE：基于整合证据的 miRNA 测序预测。

Sci Rep. 2019 Feb 7;9(1):1548. doi: 10.1038/s41598-018-38107-z.

Trends in the development of miRNA bioinformatics tools.miRNA 生物信息学工具的发展趋势。

Brief Bioinform. 2019 Sep 27;20(5):1836-1852. doi: 10.1093/bib/bby054.

Ensembl 2018.Ensembl 2018.

Nucleic Acids Res. 2018 Jan 4;46(D1):D754-D761. doi: 10.1093/nar/gkx1098.

Rfam 13.0: shifting to a genome-centric resource for non-coding RNA families.RFAM 13.0：转向以基因组为中心的非编码 RNA 家族资源

Nucleic Acids Res. 2018 Jan 4;46(D1):D335-D342. doi: 10.1093/nar/gkx1038.

The UCSC Genome Browser database: 2018 update.UCSC 基因组浏览器数据库：2018 年更新。

Nucleic Acids Res. 2018 Jan 4;46(D1):D762-D769. doi: 10.1093/nar/gkx1020.

Mirnovo: genome-free prediction of microRNAs from small RNA sequencing data and single-cells using decision forests.Mirnovo：利用决策森林从小RNA测序数据和单细胞中进行无基因组的微小RNA预测。

Nucleic Acids Res. 2017 Dec 1;45(21):e177. doi: 10.1093/nar/gkx836.

Genome-wide pre-miRNA discovery from few labeled examples.从少量标记的样本中进行全基因组预 miRNA 发现。

Bioinformatics. 2018 Feb 15;34(4):541-549. doi: 10.1093/bioinformatics/btx612.

miR-33a is a tumor suppressor microRNA that is decreased in prostate cancer.微小RNA-33a是一种肿瘤抑制性微小RNA，在前列腺癌中表达降低。

Oncotarget. 2017 Jul 24;8(36):60243-60256. doi: 10.18632/oncotarget.19521. eCollection 2017 Sep 1.

Cell cycle-targeting microRNAs promote differentiation by enforcing cell-cycle exit.细胞周期靶向 microRNAs 通过强制细胞周期退出促进分化。

Proc Natl Acad Sci U S A. 2017 Oct 3;114(40):10660-10665. doi: 10.1073/pnas.1702914114. Epub 2017 Sep 18.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

多视图协同训练在 microRNA 预测中的应用。

Multi-view Co-training for microRNA Prediction.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献