Department of Computer Science and Engineering, University of Nevada, Reno, Nevada 89557, USA.
Department of Computer Science, Wayne State University, Detroit, Michigan 48202, USA.
Genome Res. 2017 Dec;27(12):2025-2039. doi: 10.1101/gr.215129.116. Epub 2017 Oct 24.
Advances in high-throughput technologies allow for measurements of many types of omics data, yet the meaningful integration of several different data types remains a significant challenge. Another important and difficult problem is the discovery of molecular disease subtypes characterized by relevant clinical differences, such as survival. Here we present a novel approach, called erturbation clustering for data tegration and disease ubtyping (PINS), which is able to address both challenges. The framework has been validated on thousands of cancer samples, using gene expression, DNA methylation, noncoding microRNA, and copy number variation data available from the Gene Expression Omnibus, the Broad Institute, The Cancer Genome Atlas (TCGA), and the European Genome-Phenome Archive. This simultaneous subtyping approach accurately identifies known cancer subtypes and novel subgroups of patients with significantly different survival profiles. The results were obtained from genome-scale molecular data without any other type of prior knowledge. The approach is sufficiently general to replace existing unsupervised clustering approaches outside the scope of bio-medical research, with the additional ability to integrate multiple types of data.
高通量技术的进步使得可以测量许多类型的组学数据,但将几种不同类型的数据进行有意义的整合仍然是一个重大挑战。另一个重要且困难的问题是发现具有相关临床差异(如生存)的分子疾病亚型。在这里,我们提出了一种称为erturbation 聚类进行数据集成和疾病分型(PINS)的新方法,该方法能够解决这两个挑战。该框架已经在数千个癌症样本上进行了验证,使用了来自基因表达综合数据库、布罗德研究所、癌症基因组图谱(TCGA)和欧洲基因组-表型档案的基因表达、DNA 甲基化、非编码 microRNA 和拷贝数变异数据。这种同时进行亚型分类的方法能够准确识别已知的癌症亚型和具有显著不同生存特征的新型患者亚群。这些结果是从全基因组分子数据中获得的,而无需任何其他类型的先验知识。该方法足够通用,可以替代生物医学研究范围之外的现有无监督聚类方法,并具有整合多种类型数据的额外能力。