Duan Guangliang, Huo Qi, Ni Wei, Ding Fei, Ye Yuefang, Tang Tingting, Dai Huiping
Department of Oncology, The Affiliated Hospital of Hangzhou Normal University, Hangzhou, 310015, Zhejiang, People's Republic of China.
Department of Gastroenterology, The Affiliated Hospital of Hangzhou Normal University, Hangzhou, 310015, Zhejiang, People's Republic of China.
Discov Oncol. 2025 May 23;16(1):886. doi: 10.1007/s12672-025-02560-w.
Lung squamous cell carcinoma (LUSC) is a leading cause of cancer-related mortality, and tumor heterogeneity could result in diverse prognostic subtypes. Traditional prognostic factors, like tumor, node, and metastasis (TNM) staging, offer limited predictive accuracy. This study aims to identify LUSC subtypes and develop predictive models that have the potential to improve prognosis prediction accuracy and support personalized treatment.
Expression and clinical data were collected from three datasets. One dataset (TCGA-LUSC) was used as a training set, while the others (GSE30219 and GSE73403) were independent testing sets. Unsupervised clustering was applied to the training set to identify LUSC subtypes. The relationship between survival outcomes and these identified subtypes was validated in the testing sets using binary machine learning models and survival curve analysis. The impact of chemotherapy on the prognosis for subtypes was also presented. Subsequently, four survival machine learning models were developed to predict LUSC prognosis. These models were validated in the testing sets and integrated into an online tool to assist in survival prediction.
Two subtypes, C1 and C2, were identified in the training set. The C1 subtype was associated with poorer survival outcomes and was enriched in cancer-associated fibroblasts and macrophages. In contrast, the C2 subtype correlated with better outcomes and was enriched in CD8 + T cells. Regarding chemotherapy, the C2 subtype with chemotherapy showed the best survival outcomes compared to other groups. A 9-gene signature was derived from the model's importance values for subtype prediction and included TGM2, AOC3, TBXA2R, RGS3, DLC1, MMP19, ACVRL1, TCF21, and TIMP3. This signature outperformed 14 published signatures and clinical variables at survival prediction with the highest time-dependent AUC (tdAUC) and concordance index (C-index). Four machine learning models were developed using this signature, achieving tdAUC values of 0.712 and 0.684 and C-index values of 0.682 and 0.625 in the independent testing sets. An online tool for predicting survival probabilities for LUSC patients up to 10 years post-treatment is available at https://hznuduan.shinyapps.io/LCSP/ .
We identified two LUSC subtypes by unsupervised clustering and developed an online tool for prognosis prediction using supervised machine learning models.
肺鳞状细胞癌(LUSC)是癌症相关死亡的主要原因,肿瘤异质性可导致多种预后亚型。传统的预后因素,如肿瘤、淋巴结和转移(TNM)分期,预测准确性有限。本研究旨在识别LUSC亚型并开发预测模型,以提高预后预测准确性并支持个性化治疗。
从三个数据集中收集表达和临床数据。一个数据集(TCGA-LUSC)用作训练集,另外两个(GSE30219和GSE73403)作为独立测试集。对训练集应用无监督聚类以识别LUSC亚型。使用二元机器学习模型和生存曲线分析在测试集中验证生存结果与这些识别出的亚型之间的关系。还展示了化疗对各亚型预后的影响。随后,开发了四个生存机器学习模型来预测LUSC预后。这些模型在测试集中进行了验证,并集成到一个在线工具中以辅助生存预测。
在训练集中识别出两种亚型,C1和C2。C1亚型与较差的生存结果相关,且在癌症相关成纤维细胞和巨噬细胞中富集。相比之下,C2亚型与较好的结果相关,且在CD8 + T细胞中富集。关于化疗,接受化疗的C2亚型与其他组相比显示出最佳的生存结果。从模型的重要性值中得出一个9基因特征用于亚型预测,包括TGM2、AOC3、TBXA2R、RGS3、DLC1、MMP19、ACVRL1、TCF21和TIMP3。该特征在生存预测方面优于14个已发表的特征和临床变量,具有最高的时间依赖性AUC(tdAUC)和一致性指数(C指数)。使用此特征开发了四个机器学习模型,在独立测试集中实现的tdAUC值分别为0.712和0.684,C指数值分别为0.682和0.625。可通过https://hznuduan.shinyapps.io/LCSP/获取一个用于预测LUSC患者治疗后长达10年生存概率的在线工具。
我们通过无监督聚类识别出两种LUSC亚型,并使用监督机器学习模型开发了一个用于预后预测的在线工具。