Brito Isabel, Hupé Philippe, Neuvial Pierre, Barillot Emmanuel
Institut Curie, Paris, France ; INSERM, U900, Paris, France ; Mines ParisTech, Fontainebleau, France.
PLoS One. 2013 Dec 5;8(12):e81458. doi: 10.1371/journal.pone.0081458. eCollection 2013.
Array-CGH can be used to determine DNA copy number, imbalances in which are a fundamental factor in the genesis and progression of tumors. The discovery of classes with similar patterns of array-CGH profiles therefore adds to our understanding of cancer and the treatment of patients. Various input data representations for array-CGH, dissimilarity measures between tumor samples and clustering algorithms may be used for this purpose. The choice between procedures is often difficult. An evaluation procedure is therefore required to select the best class discovery method (combination of one input data representation, one dissimilarity measure and one clustering algorithm) for array-CGH. Robustness of the resulting classes is a common requirement, but no stability-based comparison of class discovery methods for array-CGH profiles has ever been reported.
We applied several class discovery methods and evaluated the stability of their solutions, with a modified version of Bertoni's [Formula: see text]-based test [1]. Our version relaxes the assumption of independency required by original Bertoni's [Formula: see text]-based test. We conclude that Minimal Regions of alteration (a concept introduced by [2]) for input data representation, sim [3] or agree [4] for dissimilarity measure and the use of average group distance in the clustering algorithm produce the most robust classes of array-CGH profiles.
The software is available from http://bioinfo.curie.fr/projects/cgh-clustering. It has also been partly integrated into "Visualization and analysis of array-CGH"(VAMP)[5]. The data sets used are publicly available from ACTuDB [6].
阵列比较基因组杂交(Array-CGH)可用于确定DNA拷贝数,其失衡是肿瘤发生和发展的一个基本因素。因此,发现具有相似阵列比较基因组杂交图谱模式的类别有助于我们对癌症和患者治疗的理解。为此,可以使用阵列比较基因组杂交的各种输入数据表示、肿瘤样本之间的差异度量和聚类算法。程序之间的选择通常很困难。因此,需要一种评估程序来为阵列比较基因组杂交选择最佳的类别发现方法(一种输入数据表示、一种差异度量和一种聚类算法的组合)。所得类别的稳健性是一个常见要求,但从未有过基于稳定性对阵列比较基因组杂交图谱的类别发现方法进行比较的报道。
我们应用了几种类别发现方法,并使用基于贝托尼(Bertoni)的[公式:见正文]检验的修改版本[1]评估了它们解决方案的稳定性。我们的版本放宽了原始基于贝托尼的[公式:见正文]检验所需的独立性假设。我们得出结论,对于输入数据表示,改变的最小区域(由[2]引入的概念)、对于差异度量使用sim[3]或agree[4]以及在聚类算法中使用平均组距离会产生最稳健的阵列比较基因组杂交图谱类别。