Du Lixin, Wang Pan, Qiu Xiaoting, Li Zhigang, Ma Jianlan, Chen Pengfei
Department of Medical Imaging, Shenzhen Longhua District Key Laboratory of Neuroimaging, Shenzhen Longhua District Central Hospital, Shenzhen, 518110, China.
Discov Oncol. 2025 Jan 13;16(1):38. doi: 10.1007/s12672-025-01792-0.
Glioblastoma multiforme (GBM) is a highly aggressive brain cancer with poor prognosis and limited treatment options. Despite advances in understanding its molecular mechanisms, effective therapeutic strategies remain elusive due to the tumor's genetic complexity and heterogeneity.
This study employed a comprehensive analysis approach integrating 113 machine learning algorithms with Mendelian Randomization (MR) analysis to investigate the molecular underpinnings of GBM. Five publicly available gene expression datasets were analyzed to identify differentially expressed genes (DEGs) associated with GBM. Weighted Gene Co-expression Network Analysis (WGCNA) was used to identify GBM-related gene modules. Further, gene set enrichment and variation analyses were conducted to explore the biological pathways involved. The machine learning models were evaluated using Receiver Operating Characteristic (ROC) curves and confusion matrices to assess their predictive accuracy, with the best-performing model validated across external datasets. MR analysis was performed to establish causal relationships between genetically predicted gene expression levels and GBM outcomes.
The study identified 286 DEGs between GBM and adjacent normal tissues across five datasets. WGCNA highlighted the yellow module as the most relevant to GBM, containing key genes such as KLHL3, FOXO4, and MAP1A. Of the 113 machine learning models tested, Ridge regression achieved the highest area under the curve (AUC) of 0.92, demonstrating robust predictive accuracy. Validation using external datasets confirmed the model's reliability, with a classification accuracy of 89.5% in the training set and 85.3% in the validation sets. MR analysis provided strong evidence of a causal relationship between the expression levels of the identified genes and GBM risk.
This study demonstrates the power of combining machine learning and Mendelian Randomization to uncover novel genetic markers for GBM. The identified genes offer promising potential as biomarkers for GBM diagnosis and therapy, providing new avenues for personalized treatment strategies.
多形性胶质母细胞瘤(GBM)是一种侵袭性很强的脑癌,预后较差且治疗选择有限。尽管在理解其分子机制方面取得了进展,但由于肿瘤的基因复杂性和异质性,有效的治疗策略仍然难以捉摸。
本研究采用综合分析方法,将113种机器学习算法与孟德尔随机化(MR)分析相结合,以研究GBM的分子基础。分析了五个公开可用的基因表达数据集,以鉴定与GBM相关的差异表达基因(DEG)。加权基因共表达网络分析(WGCNA)用于鉴定与GBM相关的基因模块。此外,进行了基因集富集和变异分析,以探索其中涉及的生物学途径。使用受试者工作特征(ROC)曲线和混淆矩阵评估机器学习模型,以评估其预测准确性,并在外部数据集上验证表现最佳的模型。进行MR分析以建立基因预测的基因表达水平与GBM结果之间的因果关系。
该研究在五个数据集中鉴定出GBM与相邻正常组织之间有286个DEG。WGCNA突出显示黄色模块与GBM最相关,其中包含KLHL3、FOXO4和MAP1A等关键基因。在测试的113种机器学习模型中,岭回归的曲线下面积(AUC)最高,为0.92,显示出强大的预测准确性。使用外部数据集进行验证证实了该模型的可靠性,训练集的分类准确率为89.5%,验证集的分类准确率为85.3%。MR分析提供了有力证据,证明所鉴定基因的表达水平与GBM风险之间存在因果关系。
本研究证明了将机器学习和孟德尔随机化相结合以发现GBM新遗传标记的能力。所鉴定的基因作为GBM诊断和治疗的生物标志物具有广阔前景,为个性化治疗策略提供了新途径。