Experimental and Clinical Pharmacology Unit, Centro di Riferimento Oncologico di Aviano (CRO) IRCCS, 33081 Aviano, Italy.
Department of Electrical, Computer and Biomedical Engineering, University of Pavia, 27100 Pavia, Italy.
Cells. 2021 Mar 5;10(3):576. doi: 10.3390/cells10030576.
Gliomas are the most common primary neoplasm of the central nervous system. A promising frontier in the definition of glioma prognosis and treatment is represented by epigenetics. Furthermore, in this study, we developed a machine learning classification model based on epigenetic data (CpG probes) to separate patients according to their state of immunosuppression. We considered 573 cases of low-grade glioma (LGG) and glioblastoma (GBM) from The Cancer Genome Atlas (TCGA). First, from gene expression data, we derived a novel binary indicator to flag patients with a favorable immune state. Then, based on previous studies, we selected the genes related to the immune state of tumor microenvironment. After, we improved the selection with a data-driven procedure, based on Boruta. Finally, we tuned, trained, and evaluated both random forest and neural network classifiers on the resulting dataset. We found that a multi-layer perceptron network fed by the 338 probes selected by applying both expert choice and Boruta results in the best performance, achieving an out-of-sample accuracy of 82.8%, a Matthews correlation coefficient of 0.657, and an area under the ROC curve of 0.9. Based on the proposed model, we provided a method to stratify glioma patients according to their epigenomic state.
神经胶质瘤是中枢神经系统最常见的原发性肿瘤。在定义神经胶质瘤的预后和治疗方面,一个有前途的前沿领域是表观遗传学。此外,在这项研究中,我们开发了一种基于表观遗传数据(CpG 探针)的机器学习分类模型,根据患者的免疫抑制状态对其进行分类。我们考虑了来自癌症基因组图谱(TCGA)的 573 例低级别神经胶质瘤(LGG)和胶质母细胞瘤(GBM)病例。首先,我们从基因表达数据中得出了一个新的二进制指标,以标记具有有利免疫状态的患者。然后,基于先前的研究,我们选择了与肿瘤微环境免疫状态相关的基因。之后,我们通过基于 Boruta 的数据驱动过程改进了选择。最后,我们在生成的数据集上调整、训练和评估了随机森林和神经网络分类器。我们发现,由应用专家选择和 Boruta 结果选择的 338 个探针输入的多层感知机网络的性能最佳,其样本外准确率为 82.8%,马修斯相关系数为 0.657,ROC 曲线下面积为 0.9。基于所提出的模型,我们提供了一种根据患者的表观基因组状态对神经胶质瘤患者进行分层的方法。