Van Wieringen Wessel N, Van De Wiel Mark A, Ylstra Bauke
Department of Mathematics, Vrije Universiteit, De Boelelaan 1081a, 1081 HV Amsterdam, The Netherlands.
Biostatistics. 2008 Jul;9(3):484-500. doi: 10.1093/biostatistics/kxm048. Epub 2007 Dec 22.
Array comparative genomic hybridization (aCGH) is a laboratory technique to measure chromosomal copy number changes. A clear biological interpretation of the measurements is obtained by mapping these onto an ordinal scale with categories loss/normal/gain of a copy. The pattern of gains and losses harbors a level of tumor specificity. Here, we present WECCA (weighted clustering of called aCGH data), a method for weighted clustering of samples on the basis of the ordinal aCGH data. Two similarities to be used in the clustering and particularly suited for ordinal data are proposed, which are generalized to deal with weighted observations. In addition, a new form of linkage, especially suited for ordinal data, is introduced. In a simulation study, we show that the proposed cluster method is competitive to clustering using the continuous data. We illustrate WECCA using an application to a breast cancer data set, where WECCA finds a clustering that relates better with survival than the original one.
阵列比较基因组杂交(aCGH)是一种用于测量染色体拷贝数变化的实验室技术。通过将这些测量结果映射到具有拷贝数缺失/正常/增加类别的有序尺度上,可以获得对测量结果的清晰生物学解释。增益和损失模式具有一定程度的肿瘤特异性。在此,我们提出了WECCA(调用的aCGH数据的加权聚类),这是一种基于有序aCGH数据对样本进行加权聚类的方法。提出了两种用于聚类且特别适用于有序数据的相似度,它们被推广以处理加权观测值。此外,引入了一种特别适用于有序数据的新的连锁形式。在一项模拟研究中,我们表明所提出的聚类方法与使用连续数据进行聚类具有竞争力。我们通过将WECCA应用于乳腺癌数据集来说明,在该数据集中,WECCA找到的聚类与生存率的相关性比原始聚类更好。