Broët Philippe, Richardson Sylvia
Faculté de Médecine, Université Paris-XI IFR69, 16 Avenue Paul Vaillant Couturier 94807 Villejuif Cedex, France.
Bioinformatics. 2006 Apr 15;22(8):911-8. doi: 10.1093/bioinformatics/btl035. Epub 2006 Feb 2.
Comparative genomic hybridization array experiments that investigate gene copy number changes present new challenges for statistical analysis and call for methods that incorporate spatial dependence between sequences along the chromosome. For this purpose, we propose a novel method called CGHmix. It is based on a spatially structured mixture model with three states corresponding to genomic sequences that are either unmodified, deleted or amplified. Inference is performed in a Bayesian framework. From the output, posterior probabilities of belonging to each of the three states are estimated for each genomic sequence and used to classify them.
Using simulated data, CGHmix is validated and compared with both a conventional unstructured mixture model and with a recently proposed data mining method. We demonstrate the good performance of CGHmix for classifying copy number changes. In addition, the method provides a good estimate of the false discovery rate. We also present the analysis of a cancer related dataset.
研究基因拷贝数变化的比较基因组杂交阵列实验给统计分析带来了新挑战,需要采用考虑染色体上序列间空间依赖性的方法。为此,我们提出了一种名为CGHmix的新方法。它基于一个具有三种状态的空间结构化混合模型,这三种状态分别对应未修饰、缺失或扩增的基因组序列。推理在贝叶斯框架下进行。从输出结果中,估计每个基因组序列属于三种状态中每一种的后验概率,并用于对其进行分类。
使用模拟数据对CGHmix进行了验证,并与传统的非结构化混合模型以及最近提出的数据挖掘方法进行了比较。我们证明了CGHmix在分类拷贝数变化方面的良好性能。此外,该方法能很好地估计错误发现率。我们还展示了对一个癌症相关数据集的分析。