Department of Applied Mathematics & Statistics, Stony Brook University, New York, NY, United States of America.
Department of Medical Physics, Memorial Sloan Kettering Cancer Center, New York, NY, United States of America.
PLoS One. 2022 Mar 14;17(3):e0265150. doi: 10.1371/journal.pone.0265150. eCollection 2022.
In this paper, we present a network-based clustering method, called vector Wasserstein clustering (vWCluster), based on the vector-valued Wasserstein distance derived from optimal mass transport (OMT) theory. This approach allows for the natural integration of multi-layer representations of data in a given network from which one derives clusters via a hierarchical clustering approach. In this study, we applied the methodology to multi-omics data from the two largest breast cancer studies. The resultant clusters showed significantly different survival rates in Kaplan-Meier analysis in both datasets. CIBERSORT scores were compared among the identified clusters. Out of the 22 CIBERSORT immune cell types, 9 were commonly significantly different in both datasets, suggesting the difference of tumor immune microenvironment in the clusters. vWCluster can aggregate multi-omics data represented as a vectorial form in a network with multiple layers, taking into account the concordant effect of heterogeneous data, and further identify subgroups of tumors in terms of mortality.
在本文中,我们提出了一种基于网络的聚类方法,称为向量 Wasserstein 聚类(vWCluster),该方法基于从最优物质传输(OMT)理论得出的向量 Wasserstein 距离。这种方法允许从给定网络中自然地整合数据的多层表示,通过层次聚类方法从中得出聚类。在这项研究中,我们将该方法应用于来自两个最大的乳腺癌研究的多组学数据。在两个数据集中,通过 Kaplan-Meier 分析,所得到的聚类显示出明显不同的生存率。比较了在鉴定的聚类之间的 CIBERSORT 评分。在 22 种 CIBERSORT 免疫细胞类型中,有 9 种在两个数据集中均有显著差异,这表明聚类中肿瘤免疫微环境的差异。vWCluster 可以在具有多个层的网络中聚合表示为向量形式的多组学数据,同时考虑到异构数据的一致性影响,并进一步根据死亡率识别肿瘤的亚组。