Department of Biostatistics & Data Science, University of Kansas Medical Center, Kansas City, KS, USA.
Department of Population Health Science & Policy, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
Methods Mol Biol. 2023;2629:73-93. doi: 10.1007/978-1-0716-2986-4_5.
Cancers are heterogeneous diseases caused by accumulated mutations or abnormal alterations at multi-levels of biological processes including genomics, epigenomics, transcriptomics, and proteomics. There is a great clinical interest in identifying cancer molecular subtypes for disease prognosis and personalized medicine. Integrative clustering is a powerful unsupervised learning method that has been increasingly used to identify cancer molecular subtypes using multi-omics data including somatic mutations, DNA copy numbers, DNA methylation, and gene expression. Integrative clustering methods are generally classified into model-based or nonparametric approaches. In this chapter, we will give an overview of the frequently used model-based methods, including iCluster, iClusterPlus, and iClusterBayes, and the nonparametric method, integrative nonnegative matrix factorization (intNMF). We will use the integrative analyses of uveal melanoma and lower-grade glioma to illustrate these representative methods. Finally, we will discuss the strengths and limitations of these representative methods and give suggestions for performing integrative analyses of cancer multi-omics data in practice.
癌症是由基因组学、表观基因组学、转录组学和蛋白质组学等多个生物学层面的累积突变或异常改变引起的异质性疾病。识别癌症分子亚型对于疾病预后和个性化医学具有重要的临床意义。整合聚类是一种强大的无监督学习方法,已越来越多地用于使用包括体细胞突变、DNA 拷贝数、DNA 甲基化和基因表达在内的多组学数据识别癌症分子亚型。整合聚类方法通常分为基于模型或非参数方法。在本章中,我们将概述常用的基于模型的方法,包括 iCluster、iClusterPlus 和 iClusterBayes,以及非参数方法,整合非负矩阵分解(intNMF)。我们将使用葡萄膜黑色素瘤和低级别神经胶质瘤的综合分析来说明这些代表性方法。最后,我们将讨论这些代表性方法的优缺点,并为实际进行癌症多组学数据的综合分析提供建议。