Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, NY 11794, USA.
Pac Symp Biocomput. 2024;29:627-640.
High throughput profiling of multiomics data provides a valuable resource to better understand the complex human disease such as cancer and to potentially uncover new subtypes. Integrative clustering has emerged as a powerful unsupervised learning framework for subtype discovery. In this paper, we propose an efficient weighted integrative clustering called intCC by combining ensemble method, consensus clustering and kernel learning integrative clustering. We illustrate that intCC can accurately uncover the latent cluster structures via extensive simulation studies and a case study on the TCGA pan cancer datasets. An R package intCC implementing our proposed method is available at https://github.com/candsj/intCC.
高通量多组学数据分析为更好地理解人类复杂疾病(如癌症)提供了有价值的资源,并有可能发现新的亚型。综合聚类已成为一种强大的无监督学习框架,用于发现亚型。在本文中,我们提出了一种称为 intCC 的高效加权综合聚类方法,该方法结合了集成方法、共识聚类和核学习综合聚类。通过广泛的模拟研究和对 TCGA 泛癌数据集的案例研究,我们表明 intCC 可以准确地揭示潜在的聚类结构。实现我们提出的方法的 R 包 intCC 可在 https://github.com/candsj/intCC 上获得。