Munquad Sana, Das Asim Bikas
Department of Biotechnology, National Institute of Technology Warangal, Warangal, Telangana, 506004, India.
BioData Min. 2023 Nov 15;16(1):32. doi: 10.1186/s13040-023-00349-7.
The classification of glioma subtypes is essential for precision therapy. Due to the heterogeneity of gliomas, the subtype-specific molecular pattern can be captured by integrating and analyzing high-throughput omics data from different genomic layers. The development of a deep-learning framework enables the integration of multi-omics data to classify the glioma subtypes to support the clinical diagnosis.
Transcriptome and methylome data of glioma patients were preprocessed, and differentially expressed features from both datasets were identified. Subsequently, a Cox regression analysis determined genes and CpGs associated with survival. Gene set enrichment analysis was carried out to examine the biological significance of the features. Further, we identified CpG and gene pairs by mapping them in the promoter region of corresponding genes. The methylation and gene expression levels of these CpGs and genes were embedded in a lower-dimensional space with an autoencoder. Next, ANN and CNN were used to classify subtypes using the latent features from embedding space. CNN performs better than ANN for subtyping lower-grade gliomas (LGG) and glioblastoma multiforme (GBM). The subtyping accuracy of CNN was 98.03% (± 0.06) and 94.07% (± 0.01) in LGG and GBM, respectively. The precision of the models was 97.67% in LGG and 90.40% in GBM. The model sensitivity was 96.96% in LGG and 91.18% in GBM. Additionally, we observed the superior performance of CNN with external datasets. The genes and CpGs pairs used to develop the model showed better performance than the random CpGs-gene pairs, preprocessed data, and single omics data.
The current study showed that a novel feature selection and data integration strategy led to the development of DeepAutoGlioma, an effective framework for diagnosing glioma subtypes.
胶质瘤亚型分类对于精准治疗至关重要。由于胶质瘤的异质性,通过整合和分析来自不同基因组层面的高通量组学数据,可以捕捉到亚型特异性分子模式。深度学习框架的发展使得多组学数据的整合成为可能,从而对胶质瘤亚型进行分类以支持临床诊断。
对胶质瘤患者的转录组和甲基化组数据进行预处理,并识别两个数据集中的差异表达特征。随后,进行Cox回归分析以确定与生存相关的基因和CpG。进行基因集富集分析以检验这些特征的生物学意义。此外,通过将CpG和基因映射到相应基因的启动子区域来识别CpG与基因对。利用自动编码器将这些CpG和基因的甲基化和基因表达水平嵌入到低维空间中。接下来,使用人工神经网络(ANN)和卷积神经网络(CNN)利用嵌入空间中的潜在特征对亚型进行分类。对于低级别胶质瘤(LGG)和多形性胶质母细胞瘤(GBM)的亚型分类,CNN的表现优于ANN。CNN在LGG和GBM中的亚型分类准确率分别为98.03%(±0.06)和94.07%(±0.01)。模型在LGG中的精确率为97.67%,在GBM中的精确率为90.40%。模型在LGG中的灵敏度为96.96%,在GBM中的灵敏度为91.18%。此外,我们观察到CNN在外部数据集上的卓越性能。用于开发模型的基因和CpG对表现优于随机的CpG - 基因对、预处理数据和单一组学数据。
当前研究表明,一种新颖的特征选择和数据整合策略促成了DeepAutoGlioma的开发,这是一个用于诊断胶质瘤亚型的有效框架。