Hein Kaung Htet, Woo Wai Lok, Rafiee Gholamreza
School of Electronics, Electrical Engineering, and Computer Science, Queen's University Belfast, Belfast BT9 5BN, UK.
Department of Computer and Information Sciences, Northumbria University, Newcastle NE1 8ST, UK.
Healthcare (Basel). 2025 May 11;13(10):1114. doi: 10.3390/healthcare13101114.
Medulloblastoma is the most common malignant brain tumor in children, classified into four primary molecular subgroups: WNT, SHH, Group 3, and Group 4, each exhibiting significant molecular heterogeneity and varied survival outcomes. Accurate classification of these subgroups is crucial for optimizing treatments and improving patient outcomes. DNA methylation profiling is a promising approach for subgroup classification; however, its application is still evolving, with ongoing efforts to improve accessibility and develop more accurate classification methods.
This study aims to develop a supervised machine learning-based framework using Illumina 450K methylation data to classify medulloblastoma into seven molecular subgroups: WNT, SHH-Infant, SHH-Child, Group3-LowRisk, Group3-HighRisk, Group4-LowRisk, and Group4-HighRisk, incorporating age and risk factors for enhanced subgroup differentiation.
The proposed model leverages six metagenes, capturing the underlying patterns of the top 10,000 probes with the highest variances from Illumina 450K data, thus enhancing methylation data representation while reducing computational demands.
Among the models evaluated, the SVM achieved the highest performance, with a mean balanced accuracy 98% and a macro-averaged AUC of 0.99 in an independent validation. This suggests that the model effectively captures the relevant methylation patterns for medulloblastoma subgroup classification.
The developed SVM-based model provides a robust framework for accurate classification of medulloblastoma subgroups using DNA methylation data. Integrating this model into clinical decision making could enhance subgroup-directed therapies and improve patient outcomes.
髓母细胞瘤是儿童最常见的恶性脑肿瘤,分为四个主要分子亚组:WNT、SHH、3组和4组,每个亚组都表现出显著的分子异质性和不同的生存结果。准确分类这些亚组对于优化治疗和改善患者预后至关重要。DNA甲基化谱分析是一种有前景的亚组分类方法;然而,其应用仍在不断发展,目前正在努力提高可及性并开发更准确的分类方法。
本研究旨在开发一种基于监督机器学习的框架,使用Illumina 450K甲基化数据将髓母细胞瘤分类为七个分子亚组:WNT、SHH-婴儿型、SHH-儿童型、3组-低风险型、3组-高风险型、4组-低风险型和4组-高风险型,纳入年龄和风险因素以增强亚组区分。
所提出的模型利用六个元基因,从Illumina 450K数据中捕获方差最高的前10000个探针的潜在模式,从而在减少计算需求的同时增强甲基化数据表示。
在评估的模型中,支持向量机(SVM)表现最佳,在独立验证中平均平衡准确率为98%,宏平均AUC为0.99。这表明该模型有效地捕获了髓母细胞瘤亚组分类的相关甲基化模式。
所开发的基于SVM的模型为使用DNA甲基化数据准确分类髓母细胞瘤亚组提供了一个强大的框架。将该模型整合到临床决策中可以增强亚组导向治疗并改善患者预后。