Chalise Prabhakar, Koestler Devin C, Bimali Milan, Yu Qing, Fridley Brooke L
Department of Biostatistics, University of Kansas Medical Center, Kansas City, KS 66160, USA.
Transl Cancer Res. 2014 Jun 1;3(3):202-216. doi: 10.3978/j.issn.2218-676X.2014.06.03.
High-throughput 'omic' data, such as gene expression, DNA methylation, DNA copy number, has played an instrumental role in furthering our understanding of the molecular basis in states of human health and disease. As cells with similar morphological characteristics can exhibit entirely different molecular profiles and because of the potential that these discrepancies might further our understanding of patient-level variability in clinical outcomes, there is significant interest in the use of high-throughput 'omic' data for the identification of novel molecular subtypes of a disease. While numerous clustering methods have been proposed for identifying of molecular subtypes, most were developed for single "omic' data types and may not be appropriate when more than one 'omic' data type are collected on study subjects. Given that complex diseases, such as cancer, arise as a result of genomic, epigenomic, transcriptomic, and proteomic alterations, integrative clustering methods for the simultaneous clustering of multiple 'omic' data types have great potential to aid in molecular subtype discovery. Traditionally, ad hoc manual data integration has been performed using the results obtained from the clustering of individual 'omic' data types on the same set of patient samples. However, such methods often result in inconsistent assignment of subjects to the molecular cancer subtypes. Recently, several methods have been proposed in the literature that offers a rigorous framework for the simultaneous integration of multiple 'omic' data types in a single comprehensive analysis. In this paper, we present a systematic review of existing integrative clustering methods.
高通量“组学”数据,如基因表达、DNA甲基化、DNA拷贝数,在加深我们对人类健康和疾病状态分子基础的理解方面发挥了重要作用。由于具有相似形态特征的细胞可能表现出完全不同的分子谱,并且鉴于这些差异可能有助于我们理解临床结果中患者水平的变异性,因此人们对使用高通量“组学”数据来识别疾病的新型分子亚型有着浓厚兴趣。虽然已经提出了许多聚类方法来识别分子亚型,但大多数是为单一“组学”数据类型开发的,当在研究对象上收集不止一种“组学”数据类型时可能并不适用。鉴于诸如癌症等复杂疾病是由基因组、表观基因组、转录组和蛋白质组改变引起的,用于同时对多种“组学”数据类型进行聚类的综合聚类方法在辅助分子亚型发现方面具有巨大潜力。传统上,临时手动数据整合是使用从同一组患者样本的单个“组学”数据类型聚类中获得的结果来进行的。然而,此类方法常常导致将受试者不一致地分配到分子癌症亚型中。最近,文献中提出了几种方法,它们为在单一综合分析中同时整合多种“组学”数据类型提供了一个严格的框架。在本文中,我们对现有的综合聚类方法进行了系统综述。