IEEE/ACM Trans Comput Biol Bioinform. 2017 Sep-Oct;14(5):1115-1121. doi: 10.1109/TCBB.2016.2621769. Epub 2016 Oct 26.
One major goal of large-scale cancer omics study is to understand molecular mechanisms of cancer and find new biomedical targets. To deal with the high-dimensional multidimensional cancer omics data (DNA methylation, mRNA expression, etc.), which can be used to discover new insight on identifying cancer subtypes, clustering methods are usually used to find an effective low-dimensional subspace of the original data and then cluster cancer samples in the reduced subspace. However, due to data-type diversity and big data volume, few methods can integrate these data and map them into an effective low-dimensional subspace. In this paper, we develop a dimension-reduction and data-integration method for indentifying cancer subtypes, named Scluster. First, Scluster, respectively, projects the different original data into the principal subspaces by an adaptive sparse reduced-rank regression method. Then, a fused patient-by-patient network is obtained for these subgroups through a scaled exponential similarity kernel method. Finally, candidate cancer subtypes are identified using spectral clustering method. We demonstrate the efficiency of our Scluster method using three cancers by jointly analyzing mRNA expression, miRNA expression, and DNA methylation data. The evaluation results and analyses show that Scluster is effective for predicting survival and identifies novel cancer subtypes of large-scale multi-omics data.
大规模癌症组学研究的一个主要目标是了解癌症的分子机制并寻找新的生物医学靶点。为了处理高维多维癌症组学数据(DNA 甲基化、mRNA 表达等),这些数据可用于发现识别癌症亚型的新见解,聚类方法通常用于在原始数据的有效低维子空间中找到,并在降低的子空间中聚类癌症样本。然而,由于数据类型的多样性和大数据量,很少有方法可以整合这些数据并将其映射到有效低维子空间中。在本文中,我们开发了一种用于识别癌症亚型的降维和数据集成方法,称为 Scluster。首先,Scluster 通过自适应稀疏降秩回归方法分别将不同的原始数据投影到主子空间中。然后,通过缩放指数相似性核方法获得这些子组的融合患者对患者网络。最后,使用谱聚类方法识别候选癌症亚型。我们通过联合分析 mRNA 表达、miRNA 表达和 DNA 甲基化数据,使用三种癌症来证明我们的 Scluster 方法的效率。评估结果和分析表明,Scluster 可有效预测生存率并识别大规模多组学数据中的新型癌症亚型。