Wang Hehe, Zhang Junge, Cheng Peng, Yu Lujie, Li Chunlin, Wang Yaowen
Department of Otolaryngology, Head and Neck Surgery, The First Affiliated Hospital of Ningbo University, Ningbo, China.
Department of Anesthesiology, The First Affiliated Hospital of Ningbo University, Ningbo, China.
Discov Oncol. 2025 Jun 12;16(1):1067. doi: 10.1007/s12672-025-02932-2.
Nasopharyngeal carcinoma (NPC) lacks biomarkers demonstrating both high specificity and sensitivity for early diagnosis. This study aimed to develop robust machine learning (ML)-driven diagnostic models and identify key biomarkers through integrated analysis of multi-cohort transcriptomic data.
Seven NPC transcriptomic datasets (GSE12452, GSE40290, GSE53819, and GSE64634 were merged to form the training cohort, while GSE13597, GSE34573, and GSE61218 served as independent external validation sets) were integrated and preprocessed using ComBat for batch effect correction. Differential expression analysis identified 293 differentially expressed genes (DEGs). Twelve ML algorithms (including Stepglm, glmBoost, and RF) were systematically combined into 113 distinct models to classify NPC versus normal tissues. Top-performing models underwent external validation. Immune infiltration patterns and functional enrichment were analyzed using CIBERSORT and GSEA/GSVA, respectively.
The Stepglm[both]-RF hybrid model demonstrated exceptional performance with AUCs of 0.999 (training set; 95% CI: 0.997-1.000), 1.000 (GSE61218/GSE34573 validation), and 0.960 (GSE13597 validation). The glmBoost-RF model showed comparable efficacy, achieving AUCs of 1.000 (training), 0.950 (GSE61218), 1.000 (GSE34573), and 0.947 (GSE13597). Single-gene analysis identified RCN1 as a promising diagnostic marker (AUC = 0.953), with elevated expression levels correlating with poor prognosis in head and neck squamous cell carcinoma (HNSCC; p < 0.05). Immune profiling revealed significant enrichment of M1 macrophages and concomitant reduction of memory B cells in NPC. Functional enrichment analysis associated RCN1 with cell cycle regulation and immune-related pathways.
This study establishes two high-performance ML models (Stepglm[both]-RF and glmBoost-RF) with low variability for NPC diagnosis and identifies RCN1 as a dual-function biomarker with diagnostic and prognostic potential. The findings provide a scalable framework for early NPC detection and novel insights into immune microenvironment dysregulation.
鼻咽癌(NPC)缺乏对早期诊断具有高特异性和敏感性的生物标志物。本研究旨在通过对多队列转录组数据的综合分析,开发强大的机器学习(ML)驱动的诊断模型并识别关键生物标志物。
将七个NPC转录组数据集(GSE12452、GSE40290、GSE53819和GSE64634合并形成训练队列,而GSE13597、GSE34573和GSE61218作为独立的外部验证集)进行整合,并使用ComBat进行预处理以校正批次效应。差异表达分析确定了293个差异表达基因(DEG)。将十二种ML算法(包括Stepglm、glmBoost和RF)系统地组合成113个不同的模型,以区分NPC组织与正常组织。对表现最佳的模型进行外部验证。分别使用CIBERSORT和GSEA/GSVA分析免疫浸润模式和功能富集情况。
Stepglm[两者]-RF混合模型表现卓越,训练集的AUC为0.999(95%CI:0.997 - 1.000),在GSE61218/GSE34573验证集中为1.000,在GSE13597验证集中为0.960。glmBoost - RF模型显示出相当的效能,训练集的AUC为1.000,在GSE61218中为0.950,在GSE34573中为1.000,在GSE13597中为0.947。单基因分析确定RCN1为一个有前景的诊断标志物(AUC = 0.953),其表达水平升高与头颈部鳞状细胞癌(HNSCC)的不良预后相关(p < 0.05)。免疫图谱显示NPC中M1巨噬细胞显著富集,记忆B细胞随之减少。功能富集分析将RCN1与细胞周期调控和免疫相关途径联系起来。
本研究建立了两个用于NPC诊断的高性能ML模型(Stepglm[两者]-RF和glmBoost - RF),变异性低,并确定RCN1为具有诊断和预后潜力的双功能生物标志物。这些发现为早期NPC检测提供了一个可扩展的框架,并为免疫微环境失调提供了新的见解。