Department of Computational Biology and Applied Algorithmics, Max-Planck-Institute for Informatics, Campus E1.4, 66123 Saarbrücken, Germany.
Bioinformatics. 2013 Jul 15;29(14):1793-800. doi: 10.1093/bioinformatics/btt300. Epub 2013 May 28.
Recurrent DNA breakpoints in cancer genomes indicate the presence of critical functional elements for tumor development. Identifying them can help determine new therapeutic targets. High-dimensional DNA microarray experiments like arrayCGH afford the identification of DNA copy number breakpoints with high precision, offering a solid basis for computational estimation of recurrent breakpoint locations.
We introduce a method for identification of recurrent breakpoints (consensus breakpoints) from copy number aberration datasets. The method is based on weighted kernel counting of breakpoints around genomic locations. Counts larger than expected by chance are considered significant. We show that the consensus breakpoints facilitate consensus segmentation of the samples. We apply our method to three arrayCGH datasets and show that by using consensus segmentation we achieve significant dimension reduction, which is useful for the task of prediction of tumor phenotype based on copy number data. We use our approach for classification of neuroblastoma tumors from different age groups and confirm the recent recommendation for the choice of age cut-off for differential treatment of 18 months. We also investigate the (epi)genetic properties at consensus breakpoint locations for seven datasets and show enrichment in overlap with important functional genomic regions.
Implementation in R of our approach can be found at http://www.mpi-inf.mpg.de/ ∼laura/FeatureGrouping.html.
Supplementary data are available at Bioinformatics online.
癌症基因组中的反复 DNA 断点表明存在肿瘤发展的关键功能元素。识别这些元素可以帮助确定新的治疗靶点。高维 DNA 微阵列实验(如 arrayCGH)能够高精度地识别 DNA 拷贝数断点,为计算估计反复出现的断点位置提供了坚实的基础。
我们介绍了一种从拷贝数畸变数据集中识别反复出现的断点(共识断点)的方法。该方法基于围绕基因组位置的断点的加权核计数。超过预期的计数被认为是显著的。我们表明,共识断点有助于对样本进行共识分割。我们将我们的方法应用于三个 arrayCGH 数据集,并表明通过使用共识分割,我们实现了显著的降维,这对于基于拷贝数数据预测肿瘤表型的任务非常有用。我们使用我们的方法对来自不同年龄组的神经母细胞瘤肿瘤进行分类,并证实了最近建议选择 18 个月作为差异化治疗的年龄截止值。我们还研究了七个数据集的共识断点位置的(表观)遗传特性,并显示与重要功能基因组区域的重叠富集。
我们方法的 R 实现可在 http://www.mpi-inf.mpg.de/ ∼laura/FeatureGrouping.html 上找到。
补充数据可在 Bioinformatics 在线获得。