Lawarde Ankita, Khatun Masuma, Lingasamy Prakash, Salumets Andres, Modhukur Vijayachitra
Department of Obstetrics and Gynecology, Institute of Clinical Medicine, University of Tartu, Tartu, Estonia.
Celvia CC AS, Tartu, Estonia.
Front Bioinform. 2025 May 6;5:1571476. doi: 10.3389/fbinf.2025.1571476. eCollection 2025.
MicroRNAs (miRNAs) regulate gene expression and play an important role in carcinogenesis through complex interactions with messenger RNAs (mRNAs) and long non-coding RNAs (lncRNAs). Despite their established influence on tumor progression and therapeutic resistance, the application of miRNA interaction networks for tumor tissue-of-origin (TOO) classification remains underexplored.
We developed a machine learning (ML) framework that integrates miRNA-mRNA-lncRNA interaction networks to classify tumors by their tissue of origin. Using transcriptomic profiles from 14 cancer types in The Cancer Genome Atlas (TCGA), we constructed co-expression networks and applied multiple feature selection techniques including recursive feature elimination (RFE), random forest (RF), Boruta, and linear discriminant analysis (LDA) to identify a minimal yet informative subset of miRNA features. Ensemble ML algorithms were trained and validated with stratified five-fold cross-validation for robust performance assessment across class distributions.
Our models achieved an overall 99% classification accuracy, distinguishing 14 cancer types with high robustness and generalizability. A minimal set of 150 miRNAs selected via RFE resulted in optimal performance across all classifiers. Furthermore, in silico validation revealed that many of the top miRNAs, including and , were not only highly central in the network but also correlated with patient survival and drug response. In addition, functional enrichment analyses indicated significant involvement of miRNAs in pathways such as -beta signaling, epithelial-mesenchymal transition, and immune modulation. Our comparative analysis demonstrated that models based on miRNA outperformed those using mRNA or lncRNA classifiers.
Our integrated framework provides a biologically grounded, interpretable, and highly accurate approach for tumor tissue-of-origin classification. The identified miRNA biomarkers demonstrate strong translational potential, supported by clinical trial overlap, drug sensitivity data, and survival analyses. This work highlights the power of combining miRNA network biology with ML to improve precision oncology diagnostics and supports future development of liquid biopsy-based cancer classification.
微小RNA(miRNA)通过与信使RNA(mRNA)和长链非编码RNA(lncRNA)的复杂相互作用来调节基因表达,并在肿瘤发生中发挥重要作用。尽管它们对肿瘤进展和治疗耐药性有既定影响,但miRNA相互作用网络在肿瘤组织起源(TOO)分类中的应用仍未得到充分探索。
我们开发了一种机器学习(ML)框架,该框架整合了miRNA-mRNA-lncRNA相互作用网络,以根据肿瘤的组织起源对其进行分类。利用癌症基因组图谱(TCGA)中14种癌症类型的转录组谱,我们构建了共表达网络,并应用了多种特征选择技术,包括递归特征消除(RFE)、随机森林(RF)、Boruta和线性判别分析(LDA),以识别miRNA特征的一个最小但信息丰富的子集。集成ML算法通过分层五折交叉验证进行训练和验证,以对跨类别分布进行稳健的性能评估。
我们的模型总体分类准确率达到99%,能够以高稳健性和泛化性区分14种癌症类型。通过RFE选择的一组最少150个miRNA在所有分类器中都产生了最佳性能。此外,计算机模拟验证表明,许多顶级miRNA,包括 和 ,不仅在网络中高度核心,而且与患者生存和药物反应相关。此外,功能富集分析表明miRNA在诸如β信号通路、上皮-间质转化和免疫调节等途径中显著参与。我们的比较分析表明,基于miRNA的模型优于使用mRNA或lncRNA分类器的模型。
我们的综合框架为肿瘤组织起源分类提供了一种基于生物学、可解释且高度准确的方法。所鉴定的miRNA生物标志物具有很强的转化潜力,临床试验重叠、药物敏感性数据和生存分析均支持这一点。这项工作突出了将miRNA网络生物学与ML相结合以提高精准肿瘤学诊断精度的力量,并支持基于液体活检的癌症分类的未来发展。