IEEE/ACM Trans Comput Biol Bioinform. 2022 May-Jun;19(3):1472-1483. doi: 10.1109/TCBB.2020.3039511. Epub 2022 Jun 3.
The remarkable growth of multi-platform genomic profiles has led to the challenge of multiomics data integration. In this study, we present a novel network-based multiomics clustering founded on the Wasserstein distance from optimal mass transport. This distance has many important geometric properties making it a suitable choice for application in machine learning and clustering. Our proposed method of aggregating multiomics and Wasserstein distance clustering (aWCluster) is applied to breast carcinoma as well as bladder carcinoma, colorectal adenocarcinoma, renal carcinoma, lung non-small cell adenocarcinoma, and endometrial carcinoma from The Cancer Genome Atlas project. Subtypes were characterized by the concordant effect of mRNA expression, DNA copy number alteration, and DNA methylation of genes and their neighbors in the interaction network. aWCluster successfully clusters all cancer types into classes with significantly different survival rates. Also, a gene ontology enrichment analysis of significant genes in the low survival subgroup of breast cancer leads to the well-known phenomenon of tumor hypoxia and the transcription factor ETS1 whose expression is induced by hypoxia. We believe aWCluster has the potential to discover novel subtypes and biomarkers by accentuating the genes that have concordant multiomics measurements in their interaction network, which are challenging to find without the network inference or with single omics analysis.
多平台基因组图谱的显著增长导致了多组学数据整合的挑战。在这项研究中,我们提出了一种新颖的基于最优物质传输 Wasserstein 距离的网络多组学聚类方法。该距离具有许多重要的几何性质,使其成为机器学习和聚类应用的合适选择。我们提出的多组学和 Wasserstein 距离聚类(aWCluster)方法应用于癌症基因组图谱项目中的乳腺癌、膀胱癌、结直肠腺癌、肾细胞癌、肺非小细胞腺癌和子宫内膜癌。亚型的特征是基因及其邻居在相互作用网络中的 mRNA 表达、DNA 拷贝数改变和 DNA 甲基化的一致效应。aWCluster 成功地将所有癌症类型聚类为具有显著不同生存率的类别。此外,对乳腺癌低生存率亚组中显著基因的基因本体富集分析导致了众所周知的肿瘤缺氧现象和转录因子 ETS1,其表达受缺氧诱导。我们相信,aWCluster 通过强调其相互作用网络中具有一致多组学测量的基因,有可能发现新的亚型和生物标志物,而没有网络推断或单一组学分析,这些基因很难找到。