Hu Beibei, Yin Guohui, Zhu Jialin, Bai Yi, Sun Xuren
Department of Gastroenterology, First Affiliated Hospital of China Medical University, Shenyang, China.
Key Laboratory of Traffic Safety On Track (Central South University), Ministry of Education, School of Traffic and Transportation Engineering, Central South University, Changsha, 410075, China.
BMC Med Inform Decis Mak. 2024 Dec 18;24(1):384. doi: 10.1186/s12911-024-02794-8.
Tumor mutation burden (TMB) has been considered a biomarker for utilization of immune checkpoint inhibitors(ICIs), but whole exome sequencing(WES) and cancer gene panel(CGP) based on next generation sequencing for TMB detection are costly. Here, we use transcriptome data of TCGA to construct a model for TMB prediction in gastrointestinal tumors.
Transcriptome data, somatic mutation data and clinical data of four gastrointestinal tumors from TCGA, including esophageal cancer (ESCA), stomach adenocarcinoma (STAD), colon adenocarcinoma (COAD) and rectal adenocarcinoma (READ). Using R, we performed visual analysis of somatic mutation data, differentially expressed genes (DEGs) function enrichment analysis, gene set enrichment analysis (GSEA), and estimated TMB value in clinic. Finally, a deep neural network (DNN) model was constructed for TMB prediction.
Visualization of somatic mutation data summarized the classification of mutation, frequency of each mutation type, and top-mutated genes. GSEA showed the enrichment of CD4/CD8 T cells in the high TMB group and the activation of tumor suppressing pathways. Single-sample GSEA (ssGSEA) manifested that the high-TMB group had higher level of multiple immune cells infiltration. In addition, distribution of TMB was related to clinical parameters. Like age, M stage, N stage, AJCC stage, and overall survival(OS). After model optimization using genetic algorithm, in the training set, validation set, and testing set, the Pearson relevance coefficient r between predicted values and actual values reaches 0.98, 0.82, and 0.92, respectively; the coefficient of determination R2 is 0.95, 0.82, and 0.7, respectively.
TMB correlates with clinicopathological parameters in gastrointestinal carcinoma, and patients with high TMB have higher levels of immune infiltration. In addition, the DNN model based on 31 genes predicts TMB of gastrointestinal tumors in a high accuracy.
肿瘤突变负荷(TMB)被认为是免疫检查点抑制剂(ICI)应用的生物标志物,但基于下一代测序的全外显子组测序(WES)和癌症基因panel(CGP)用于TMB检测成本高昂。在此,我们利用TCGA的转录组数据构建了一个用于预测胃肠道肿瘤TMB的模型。
来自TCGA的四种胃肠道肿瘤的转录组数据、体细胞突变数据和临床数据,包括食管癌(ESCA)、胃腺癌(STAD)、结肠腺癌(COAD)和直肠腺癌(READ)。我们使用R对体细胞突变数据进行可视化分析、差异表达基因(DEG)功能富集分析、基因集富集分析(GSEA),并在临床中估计TMB值。最后,构建了一个深度神经网络(DNN)模型用于TMB预测。
体细胞突变数据的可视化总结了突变分类、每种突变类型的频率以及突变频率最高的基因。GSEA显示高TMB组中CD4/CD8 T细胞富集以及肿瘤抑制通路激活。单样本GSEA(ssGSEA)表明高TMB组多种免疫细胞浸润水平更高。此外,TMB的分布与临床参数相关。如年龄、M分期、N分期、AJCC分期和总生存期(OS)。使用遗传算法对模型进行优化后,在训练集、验证集和测试集中,预测值与实际值之间的Pearson相关系数r分别达到0.98、0.82和0.92;决定系数R2分别为0.95、0.82和0.7。
TMB与胃肠道癌的临床病理参数相关,高TMB患者的免疫浸润水平更高。此外,基于31个基因的DNN模型可高精度预测胃肠道肿瘤的TMB。