Bioinformatics, Interdisciplinary Center for Biotechnology Research, University of Florida, Gainesville, FL, USA.
Department of Ophthalmology & Visual Sciences, University of Michigan, Ann Arbor, MI, USA.
Sci Rep. 2019 Nov 13;9(1):16668. doi: 10.1038/s41598-019-53048-x.
Clear cell renal cell carcinoma (ccRCC) is highly heterogeneous and is the most lethal cancer of all urologic cancers. We developed an unsupervised deep learning method, stacked denoising autoencoders (SdA), by integrating multi-platform genomic data for subtyping ccRCC with the goal of assisting diagnosis, personalized treatments and prognosis. We successfully found two subtypes of ccRCC using five genomics datasets for Kidney Renal Clear Cell Carcinoma (KIRC) from The Cancer Genome Atlas (TCGA). Correlation analysis between the last reconstructed input and the original input data showed that all the five types of genomic data positively contribute to the identification of the subtypes. The first subtype of patients had significantly lower survival probability, higher grade on neoplasm histology and higher stage on pathology than the other subtype of patients. Furthermore, we identified a set of genes, proteins and miRNAs that were differential expressed (DE) between the two subtypes. The function annotation of the DE genes from pathway analysis matches the clinical features. Importantly, we applied the model learned from KIRC as a pre-trained model to two independent datasets from TCGA, Lung Adenocarcinoma (LUAD) dataset and Low Grade Glioma (LGG), and the model stratified the LUAD and LGG patients into clinical associated subtypes. The successful application of our method to independent groups of patients supports that the SdA method and the model learned from KIRC are effective on subtyping cancer patients and most likely can be used on other similar tasks. We supplied the source code and the models to assist similar studies at https://github.com/tjgu/cancer_subtyping.
透明细胞肾细胞癌(ccRCC)高度异质性,是所有泌尿系统癌症中最致命的癌症。我们开发了一种无监督深度学习方法,即堆叠去噪自动编码器(SdA),通过整合多平台基因组数据对 ccRCC 进行亚型分类,旨在辅助诊断、个性化治疗和预后。我们使用来自癌症基因组图谱(TCGA)的五个基因组数据集成功地为肾透明细胞癌(KIRC)找到了两种 ccRCC 亚型。最后重建的输入与原始输入数据之间的相关性分析表明,所有五种类型的基因组数据都有助于识别亚型。第一组患者的生存概率明显较低,肿瘤组织学分级较高,病理学分期较高。此外,我们鉴定了一组在两种亚型之间差异表达(DE)的基因、蛋白质和 miRNA。通路分析中 DE 基因的功能注释与临床特征相匹配。重要的是,我们将从 KIRC 中学习到的模型应用于 TCGA 中的两个独立数据集,即肺腺癌(LUAD)数据集和低级别胶质瘤(LGG),并且该模型将 LUAD 和 LGG 患者分层为与临床相关的亚型。该方法在独立患者组中的成功应用支持 SdA 方法和从 KIRC 中学习到的模型在癌症患者亚型分类方面是有效的,并且很可能可以应用于其他类似任务。我们在 https://github.com/tjgu/cancer_subtyping 上提供了源代码和模型,以协助类似的研究。