Franco Edian F, Rana Pratip, Cruz Aline, Calderón Víctor V, Azevedo Vasco, Ramos Rommel T J, Ghosh Preetam
Institute of Biological Sciences, Federal University of Para, Belem, PA 66075-110, Brazil.
Laboratory of Virology and Environmental Genomics, Instituto de Innovacion en Biotecnologia e Industria (IIBI), Santo Domingo 10104, Dominican Republic.
Cancers (Basel). 2021 Apr 22;13(9):2013. doi: 10.3390/cancers13092013.
A heterogeneous disease such as cancer is activated through multiple pathways and different perturbations. Depending upon the activated pathway(s), the survival of the patients varies significantly and shows different efficacy to various drugs. Therefore, cancer subtype detection using genomics level data is a significant research problem. Subtype detection is often a complex problem, and in most cases, needs multi-omics data fusion to achieve accurate subtyping. Different data fusion and subtyping approaches have been proposed over the years, such as kernel-based fusion, matrix factorization, and deep learning autoencoders. In this paper, we compared the performance of different deep learning autoencoders for cancer subtype detection. We performed cancer subtype detection on four different cancer types from The Cancer Genome Atlas (TCGA) datasets using four autoencoder implementations. We also predicted the optimal number of subtypes in a cancer type using the silhouette score and found that the detected subtypes exhibit significant differences in survival profiles. Furthermore, we compared the effect of feature selection and similarity measures for subtype detection. For further evaluation, we used the Glioblastoma multiforme (GBM) dataset and identified the differentially expressed genes in each of the subtypes. The results obtained are consistent with other genomic studies and can be corroborated with the involved pathways and biological functions. Thus, it shows that the results from the autoencoders, obtained through the interaction of different datatypes of cancer, can be used for the prediction and characterization of patient subgroups and survival profiles.
像癌症这样的异质性疾病是通过多种途径和不同扰动激活的。根据激活的途径不同,患者的生存率差异显著,并且对各种药物表现出不同的疗效。因此,利用基因组学水平的数据进行癌症亚型检测是一个重要的研究问题。亚型检测通常是一个复杂的问题,在大多数情况下,需要多组学数据融合来实现准确的亚型分类。多年来已经提出了不同的数据融合和亚型分类方法,如基于核的融合、矩阵分解和深度学习自动编码器。在本文中,我们比较了不同深度学习自动编码器在癌症亚型检测中的性能。我们使用四种自动编码器实现方法,对来自癌症基因组图谱(TCGA)数据集的四种不同癌症类型进行了癌症亚型检测。我们还使用轮廓系数预测了一种癌症类型中的最佳亚型数量,发现检测到的亚型在生存概况上表现出显著差异。此外,我们比较了特征选择和相似性度量对亚型检测的影响。为了进一步评估,我们使用了多形性胶质母细胞瘤(GBM)数据集,并确定了每个亚型中的差异表达基因。获得的结果与其他基因组研究一致,并且可以通过所涉及的途径和生物学功能得到证实。因此,这表明通过癌症不同数据类型的相互作用从自动编码器获得的结果可用于预测和表征患者亚组及生存概况。