Laboratoire de Biometrie et Biologie Evolutive, UMR CNRS 5558 - Univ. Lyon 1, F-69622, Villeurbanne, France.
Biostatistics. 2011 Jul;12(3):413-28. doi: 10.1093/biostatistics/kxq076. Epub 2011 Jan 5.
The statistical analysis of array comparative genomic hybridization (CGH) data has now shifted to the joint assessment of copy number variations at the cohort level. Considering multiple profiles gives the opportunity to correct for systematic biases observed on single profiles, such as probe GC content or the so-called "wave effect." In this article, we extend the segmentation model developed in the univariate case to the joint analysis of multiple CGH profiles. Our contribution is multiple: we propose an integrated model to perform joint segmentation, normalization, and calling for multiple array CGH profiles. This model shows great flexibility, especially in the modeling of the wave effect that gives a likelihood framework to approaches proposed by others. We propose a new dynamic programming algorithm for break point positioning, as well as a model selection criterion based on a modified bayesian information criterion proposed in the univariate case. The performance of our method is assessed using simulated and real data sets. Our method is implemented in the R package cghseg.
目前,阵列比较基因组杂交(CGH)数据分析已转向在队列水平上联合评估拷贝数变异。考虑多个图谱有机会纠正单个图谱上观察到的系统偏差,例如探针 GC 含量或所谓的“波浪效应”。在本文中,我们将在单变量情况下开发的分割模型扩展到多个 CGH 图谱的联合分析。我们的贡献是多方面的:我们提出了一个集成模型,用于对多个阵列 CGH 图谱进行联合分割、归一化和调用。该模型具有很大的灵活性,特别是在建模波浪效应方面,它为其他人提出的方法提供了一个似然框架。我们提出了一种新的用于断点定位的动态规划算法,以及一种基于在单变量情况下提出的修正贝叶斯信息准则的模型选择标准。我们使用模拟数据集和真实数据集评估了我们方法的性能。我们的方法在 R 包 cghseg 中实现。