Wang Chloe, Kouznetsova Valentina L, Kesari Santosh, Tsigelny Igor F
Mentor Assistance Program, San Diego Supercomputer Center, University of California San Diego, La Jolla, CA, USA.
San Diego Supercomputer Center, University of California San Diego, La Jolla, CA, USA.
Childs Nerv Syst. 2025 Jun 30;41(1):221. doi: 10.1007/s00381-025-06874-6.
Medulloblastoma (MB) is the most common malignant brain tumor in children. Current diagnostic methods, such as MRI and lumbar puncture, are invasive and not sensitive enough, making early diagnosis challenging. MicroRNAs (miRNAs) have emerged as promising biomarkers for cancer diagnosis due to their dysregulated expression in tumors. This study aims to develop a novel machine learning (ML)-based diagnostic tool for MB using miRNA biomarkers.
We collected miRNAs associated with MB and random controls, generating sequence- and target gene-based descriptors. We employed the WEKA software to evaluate several ML models, including logistic regression, naïve Bayes, and multilayer perceptron (MLP). Attribute selection reduced noise by selecting the most significant 24 features. Model performance was evaluated using 10-fold cross-validation and independent test datasets.
Logistic regression achieved the highest training accuracy (96.2%), while the MLP model was selected for further testing due to its ability to capture complex nonlinear relationships in biological data. The MLP model showed 78.6% accuracy on an independent MB dataset and successfully distinguished MB miRNAs from those associated with chronic myeloid leukemia (CML), further validating its specificity.
The ML-based diagnostic tool using miRNA biomarkers shows promise for improving MB diagnosis, offering a non-invasive alternative to traditional methods. Further validation with larger datasets and diverse control groups is needed to refine the model.
髓母细胞瘤(MB)是儿童最常见的恶性脑肿瘤。当前的诊断方法,如磁共振成像(MRI)和腰椎穿刺,具有侵入性且灵敏度不足,使得早期诊断具有挑战性。微小RNA(miRNA)由于在肿瘤中表达失调,已成为癌症诊断中有前景的生物标志物。本研究旨在开发一种基于机器学习(ML)的使用miRNA生物标志物的MB诊断工具。
我们收集了与MB相关的miRNA和随机对照,生成基于序列和靶基因的描述符。我们使用WEKA软件评估了几种ML模型,包括逻辑回归、朴素贝叶斯和多层感知器(MLP)。属性选择通过选择最重要的24个特征来减少噪声。使用10折交叉验证和独立测试数据集评估模型性能。
逻辑回归达到了最高的训练准确率(96.2%),而MLP模型因其能够捕捉生物数据中的复杂非线性关系而被选用于进一步测试。MLP模型在独立的MB数据集上显示出78.6%的准确率,并成功区分了MB的miRNA与慢性粒细胞白血病(CML)相关的miRNA,进一步验证了其特异性。
使用miRNA生物标志物的基于ML的诊断工具显示出改善MB诊断的前景,为传统方法提供了一种非侵入性替代方案。需要使用更大的数据集和多样化的对照组进行进一步验证以完善模型。