Department of Biomedical Informatics, University of Utah, 26 South 2000 East, Salt Lake City, UT 84112, USA.
BMC Med Genomics. 2011 Jun 29;4:52. doi: 10.1186/1755-8794-4-52.
With the advent of whole-genome analysis for profiling tumor tissue, a pressing need has emerged for principled methods of organizing the large amounts of resulting genomic information. We propose the concept of multiplicity measures on cancer and gene networks to organize the information in a clinically meaningful manner. Multiplicity applied in this context extends Fearon and Vogelstein's multi-hit genetic model of colorectal carcinoma across multiple cancers.
Using the Catalogue of Somatic Mutations in Cancer (COSMIC), we construct networks of interacting cancers and genes. Multiplicity is calculated by evaluating the number of cancers and genes linked by the measurement of a somatic mutation. The Kamada-Kawai algorithm is used to find a two-dimensional minimum energy solution with multiplicity as an input similarity measure. Cancers and genes are positioned in two dimensions according to this similarity. A third dimension is added to the network by assigning a maximal multiplicity to each cancer or gene. Hierarchical clustering within this three-dimensional network is used to identify similar clusters in somatic mutation patterns across cancer types.
The clustering of genes in a three-dimensional network reveals a similarity in acquired mutations across different cancer types. Surprisingly, the clusters separate known causal mutations. The multiplicity clustering technique identifies a set of causal genes with an area under the ROC curve of 0.84 versus 0.57 when clustering on gene mutation rate alone. The cluster multiplicity value and number of causal genes are positively correlated via Spearman's Rank Order correlation (rs(8) = 0.894, Spearman's t = 17.48, p < 0.05). A clustering analysis of cancer types segregates different types of cancer. All blood tumors cluster together, and the cluster multiplicity values differ significantly (Kruskal-Wallis, H = 16.98, df = 2, p < 0.05).
We demonstrate the principle of multiplicity for organizing somatic mutations and cancers in clinically relevant clusters. These clusters of cancers and mutations provide representations that identify segregations of cancer and genes driving cancer progression.
随着全基因组分析技术在肿瘤组织分析中的应用,人们迫切需要一种有原则的方法来组织大量产生的基因组信息。我们提出了癌症和基因网络的多重度量概念,以便以临床有意义的方式组织这些信息。在此上下文中应用的多重性扩展了 Fearon 和 Vogelstein 的结直肠癌多击遗传模型,使其适用于多种癌症。
使用癌症体细胞突变目录(COSMIC),我们构建了相互作用的癌症和基因网络。通过评估由体细胞突变测量链接的癌症和基因的数量来计算多重性。使用 Kamada-Kawai 算法找到具有多重性作为输入相似性度量的二维最小能量解。根据这种相似性,将癌症和基因定位在二维空间中。通过为每个癌症或基因分配最大多重性,为网络添加第三个维度。在这个三维网络中进行层次聚类,以识别癌症类型之间体细胞突变模式的相似聚类。
在三维网络中对基因进行聚类揭示了不同癌症类型中获得突变的相似性。令人惊讶的是,这些聚类分离了已知的因果突变。与仅基于基因突变率聚类相比,多重聚类技术可以识别一组因果基因,ROC 曲线下面积为 0.84 与 0.57。聚类多重值和因果基因数量通过 Spearman 秩相关(rs(8) = 0.894,Spearman's t = 17.48,p < 0.05)呈正相关。癌症类型的聚类分析将不同类型的癌症分开。所有血液肿瘤聚集在一起,并且聚类多重值差异显著(Kruskal-Wallis,H = 16.98,df = 2,p < 0.05)。
我们证明了用于在临床相关聚类中组织体细胞突变和癌症的多重性原则。这些癌症和突变聚类提供了识别癌症和驱动癌症进展的基因的分离的表示。