Lai Weil R, Johnson Mark D, Kucherlapati Raju, Park Peter J
Harvard-Partners Center for Genetics and Genomics 77 Avenue Louis Pasteur, Boston, MA 02115, USA.
Bioinformatics. 2005 Oct 1;21(19):3763-70. doi: 10.1093/bioinformatics/bti611. Epub 2005 Aug 4.
Array Comparative Genomic Hybridization (CGH) can reveal chromosomal aberrations in the genomic DNA. These amplifications and deletions at the DNA level are important in the pathogenesis of cancer and other diseases. While a large number of approaches have been proposed for analyzing the large array CGH datasets, the relative merits of these methods in practice are not clear.
We compare 11 different algorithms for analyzing array CGH data. These include both segment detection methods and smoothing methods, based on diverse techniques such as mixture models, Hidden Markov Models, maximum likelihood, regression, wavelets and genetic algorithms. We compute the Receiver Operating Characteristic (ROC) curves using simulated data to quantify sensitivity and specificity for various levels of signal-to-noise ratio and different sizes of abnormalities. We also characterize their performance on chromosomal regions of interest in a real dataset obtained from patients with Glioblastoma Multiforme. While comparisons of this type are difficult due to possibly sub-optimal choice of parameters in the methods, they nevertheless reveal general characteristics that are helpful to the biological investigator.
阵列比较基因组杂交(CGH)可揭示基因组DNA中的染色体畸变。这些DNA水平上的扩增和缺失在癌症及其他疾病的发病机制中具有重要意义。尽管已经提出了大量方法来分析大型阵列CGH数据集,但这些方法在实际应用中的相对优缺点尚不清楚。
我们比较了11种不同的分析阵列CGH数据的算法。这些算法包括基于多种技术(如混合模型、隐马尔可夫模型、最大似然法、回归、小波和遗传算法)的片段检测方法和平滑方法。我们使用模拟数据计算接收者操作特征(ROC)曲线,以量化不同信噪比水平和不同大小异常情况下的灵敏度和特异性。我们还在从多形性胶质母细胞瘤患者获得的真实数据集中,对感兴趣的染色体区域上它们的性能进行了表征。虽然由于方法中参数选择可能并非最优,此类比较存在困难,但它们仍揭示了有助于生物学研究者的一般特征。