Picard F, Robin S, Lebarbier E, Daudin J-J
UMR INA P-G/ENGREF/INRA MIA 518, Paris, France.
Biometrics. 2007 Sep;63(3):758-66. doi: 10.1111/j.1541-0420.2006.00729.x.
Microarray-CGH (comparative genomic hybridization) experiments are used to detect and map chromosomal imbalances. A CGH profile can be viewed as a succession of segments that represent homogeneous regions in the genome whose representative sequences share the same relative copy number on average. Segmentation methods constitute a natural framework for the analysis, but they do not provide a biological status for the detected segments. We propose a new model for this segmentation/clustering problem, combining a segmentation model with a mixture model. We present a new hybrid algorithm called dynamic programming-expectation maximization (DP-EM) to estimate the parameters of the model by maximum likelihood. This algorithm combines DP and the EM algorithm. We also propose a model selection heuristic to select the number of clusters and the number of segments. An example of our procedure is presented, based on publicly available data sets. We compare our method to segmentation methods and to hidden Markov models, and we show that the new segmentation/clustering model is a promising alternative that can be applied in the more general context of signal processing.
微阵列比较基因组杂交(CGH)实验用于检测和定位染色体失衡。CGH图谱可视为一系列代表基因组中均匀区域的片段,其代表性序列平均具有相同的相对拷贝数。分割方法构成了分析的自然框架,但它们没有为检测到的片段提供生物学状态。我们针对此分割/聚类问题提出了一种新模型,将分割模型与混合模型相结合。我们提出了一种名为动态规划期望最大化(DP-EM)的新混合算法,通过最大似然估计模型参数。该算法结合了DP和EM算法。我们还提出了一种模型选择启发式方法来选择聚类数和片段数。基于公开可用数据集给出了我们方法的一个示例。我们将我们的方法与分割方法和隐马尔可夫模型进行比较,结果表明新的分割/聚类模型是一种有前途的替代方法,可应用于更一般的信号处理背景中。