Yin Chaoyi, Cao Yangkun, Sun Peishuo, Zhang Hengyuan, Li Zhi, Xu Ying, Sun Huiyan
School of Artificial Intelligence, Jilin University, Changchun, China.
Department of Medical Oncology, the First Hospital of China Medical University, Shenyang, China.
Front Genet. 2022 May 13;13:884028. doi: 10.3389/fgene.2022.884028. eCollection 2022.
Accurate molecular subtypes prediction of cancer patients is significant for personalized cancer diagnosis and treatments. Large amount of multi-omics data and the advancement of data-driven methods are expected to facilitate molecular subtyping of cancer. Most existing machine learning-based methods usually classify samples according to single omics data, fail to integrate multi-omics data to learn comprehensive representations of the samples, and ignore that information transfer and aggregation among samples can better represent them and ultimately help in classification. We propose a novel framework named multi-omics graph convolutional network (M-GCN) for molecular subtyping based on robust graph convolutional networks integrating multi-omics data. We first apply the Hilbert-Schmidt independence criterion least absolute shrinkage and selection operator (HSIC Lasso) to select the molecular subtype-related transcriptomic features and then construct a sample-sample similarity graph with low noise by using these features. Next, we take the selected gene expression, single nucleotide variants (SNV), and copy number variation (CNV) data as input and learn the multi-view representations of samples. On this basis, a robust variant of graph convolutional network (GCN) model is finally developed to obtain samples' new representations by aggregating their subgraphs. Experimental results of breast and stomach cancer demonstrate that the classification performance of M-GCN is superior to other existing methods. Moreover, the identified subtype-specific biomarkers are highly consistent with current clinical understanding and promising to assist accurate diagnosis and targeted drug development.
准确预测癌症患者的分子亚型对于个性化癌症诊断和治疗具有重要意义。大量的多组学数据以及数据驱动方法的进步有望推动癌症的分子分型。大多数现有的基于机器学习的方法通常根据单一组学数据对样本进行分类,无法整合多组学数据来学习样本的综合表示,并且忽略了样本之间的信息传递和聚合能够更好地表示它们并最终有助于分类。我们提出了一种名为多组学图卷积网络(M-GCN)的新颖框架,用于基于整合多组学数据的强大图卷积网络进行分子分型。我们首先应用希尔伯特-施密特独立性准则最小绝对收缩和选择算子(HSIC Lasso)来选择与分子亚型相关的转录组特征,然后利用这些特征构建一个低噪声的样本-样本相似性图。接下来,我们将所选的基因表达、单核苷酸变异(SNV)和拷贝数变异(CNV)数据作为输入,学习样本的多视图表示。在此基础上,最终开发了一种图卷积网络(GCN)模型的强大变体,通过聚合样本的子图来获得样本的新表示。乳腺癌和胃癌的实验结果表明,M-GCN的分类性能优于其他现有方法。此外,所确定的亚型特异性生物标志物与当前临床认识高度一致,有望辅助准确诊断和靶向药物开发。