Department of Mathematics and Statistics, University of Missouri-Kansas City, 5100 Rockhill Road, Kansas City, MO 64110, USA.
IEEE/ACM Trans Comput Biol Bioinform. 2009 Oct-Dec;6(4):529-41. doi: 10.1109/TCBB.2008.129.
Array comparative genomic hybridization (aCGH) provides a high-resolution and high-throughput technique for screening of copy number variations (CNVs) within the entire genome. This technique, compared to the conventional CGH, significantly improves the identification of chromosomal abnormalities. However, due to the random noise inherited in the imaging and hybridization process, identifying statistically significant DNA copy number changes in aCGH data is challenging. We propose a novel approach that uses the mean and variance change point model (MVCM) to detect CNVs or breakpoints in aCGH data sets. We derive an approximate p-value for the test statistic and also give the estimate of the locus of the DNA copy number change. We carry out simulation studies to evaluate the accuracy of the estimate and the p-value formulation. These simulation results show that the approach is effective in identifying copy number changes. The approach is also tested on fibroblast cancer cell line data, breast tumor cell line data, and breast cancer cell line aCGH data sets that are publicly available. Changes that have not been identified by the circular binary segmentation (CBS) method but are biologically verified are detected by our approach on these cell lines with higher sensitivity and specificity than CBS.
阵列比较基因组杂交(aCGH)为筛查整个基因组中的拷贝数变异(CNV)提供了一种高分辨率和高通量的技术。与传统的 CGH 相比,该技术显著提高了染色体异常的识别能力。然而,由于成像和杂交过程中固有的随机噪声,在 aCGH 数据中识别具有统计学意义的 DNA 拷贝数变化具有挑战性。我们提出了一种新的方法,该方法使用均值和方差变化点模型(MVCM)来检测 aCGH 数据集中的 CNV 或断点。我们推导出了检验统计量的近似 p 值,并给出了 DNA 拷贝数变化位置的估计。我们进行了模拟研究来评估估计和 p 值公式的准确性。这些模拟结果表明,该方法在识别拷贝数变化方面是有效的。该方法还在公开可用的成纤维癌细胞系数据、乳腺癌细胞系数据和乳腺癌细胞系 aCGH 数据集上进行了测试。在这些细胞系上,我们的方法能够检测到那些没有被圆形二进制分割(CBS)方法识别但具有生物学验证的变化,其敏感性和特异性均高于 CBS。