Zhanpeng Huang, Jiekang Wu
IEEE/ACM Trans Comput Biol Bioinform. 2022 Nov-Dec;19(6):3213-3223. doi: 10.1109/TCBB.2021.3122917. Epub 2022 Dec 8.
Multiomics data clustering is one of the major challenges in the field of precision medicine. Integration of multiomics data for cancer subtyping can improve the understanding on cancer and reveal systems-level insights. How to integrate multiomics data for accurate cancer subtyping is an interesting and challenging research problem. To capture the global and the local structure of omics data, a novel framework for integrating multiomics data is proposed for cancer subtyping. Multiview clustering with low-rank and sparsity constraints (MVCLRS) can measure the local similarities of samples in each omics data and obtain global consensus structures by integrating the multiomics data. The main insight provided by MVCLRS is that low-rank sparse subspace clustering for the construction of an affinity matrix can best capture the local similarities in omics data. Extensive testing is conducted on 10 real world cancer datasets with multiomics from The Cancer Genome Atlas. Compared with 10 state-of-the-art multiomics clustering algorithms, the MVCLRS performs better in the 10 cancer datasets by providing its clustering results with at least one enriched clinical label in nine of ten cancer subtypes, the most of any method.
多组学数据聚类是精准医学领域的主要挑战之一。整合多组学数据进行癌症亚型分类可以增进对癌症的理解,并揭示系统层面的见解。如何整合多组学数据以实现准确的癌症亚型分类是一个有趣且具有挑战性的研究问题。为了捕捉组学数据的全局和局部结构,提出了一种用于癌症亚型分类的整合多组学数据的新颖框架。具有低秩和稀疏约束的多视图聚类(MVCLRS)可以测量每个组学数据中样本的局部相似性,并通过整合多组学数据获得全局共识结构。MVCLRS提供的主要见解是,用于构建亲和矩阵的低秩稀疏子空间聚类能够最好地捕捉组学数据中的局部相似性。使用来自癌症基因组图谱的多组学数据在10个真实世界癌症数据集上进行了广泛测试。与10种先进的多组学聚类算法相比,MVCLRS在10个癌症数据集中表现更好,在10种癌症亚型中的9种中,其聚类结果至少有一个富集的临床标签,这是所有方法中最多的。