Chen Yuxing, Yan Yixin, Xu Moping, Chen Wen, Lin Jinyu, Zhao Yan, Wu Junze, Wang Xianlong
Department of Bioinformatics, School of Basic Medical Sciences, School of Medical Technology and Engineering, Key Laboratory of Medical Bioinformatics, Key Laboratory of Ministry of Education for Gastrointestinal Cancer, Fujian Medical University, Fuzhou, China.
Fujian Stomatological Hospital, Fujian Medical University, Fuzhou, China.
Front Bioinform. 2021 Nov 8;1:744345. doi: 10.3389/fbinf.2021.744345. eCollection 2021.
More than 150 types of brain tumors have been documented. Accurate diagnosis is important for making appropriate therapeutic decisions in treating the diseases. The goal of this study is to develop a DNA methylation profile-based classifier to accurately identify various kinds of brain tumors. Thirteen datasets of DNA methylation profiles were downloaded from the Gene Expression Omnibus (GEO) database, of which GSE90496 and GSE109379 were used as the training set and the validation set, respectively, and the remaining 11 sets were used as the independent test set. The random forest algorithm was used to select the CpG sites based on the importance of the features and a multilayer perceptron (MLP) model was trained to classify the samples. Deconvolution with the debCAM package was used to explore the cellular composition difference among tumors. From training datasets with 2,801 samples, 396,568 CpG sites were retained after preprocessing, of which 767 were selected as the modeling features. A three-layer MLP model was developed, which consists of 1,320 nodes in the hidden layer, to predict the histological types of brain tumors. The prediction accuracy is 99.2, 87.0, and 96.58%, respectively, on the training, validation and test sets. The results of deconvolution analysis showed that the cell proportions of different tumor subtypes were different, and it is approximately enough to distinguish different tumor entities. We developed a classifier that is robust for the classification of central nervous system tumors, and tried to analyze the reasons for the classification performance.
已记录的脑肿瘤类型超过150种。准确诊断对于制定治疗这些疾病的适当治疗决策非常重要。本研究的目的是开发一种基于DNA甲基化谱的分类器,以准确识别各种脑肿瘤。从基因表达综合数据库(GEO)下载了13个DNA甲基化谱数据集,其中GSE90496和GSE109379分别用作训练集和验证集,其余11个数据集用作独立测试集。基于特征的重要性,使用随机森林算法选择CpG位点,并训练多层感知器(MLP)模型对样本进行分类。使用debCAM软件包进行反卷积,以探索肿瘤之间的细胞组成差异。在对包含2801个样本的训练数据集中,预处理后保留了396568个CpG位点,其中767个被选为建模特征。开发了一个三层MLP模型,其隐藏层由1320个节点组成,用于预测脑肿瘤的组织学类型。在训练集、验证集和测试集上的预测准确率分别为99.2%、87.0%和96.58%。反卷积分析结果表明,不同肿瘤亚型的细胞比例不同,大致足以区分不同的肿瘤实体。我们开发了一种对中枢神经系统肿瘤分类具有鲁棒性的分类器,并试图分析分类性能的原因。