Lamy Philippe, Andersen Claus L, Dyrskjot Lars, Torring Niels, Wiuf Carsten
Bioinformatics Research Center, University of Aarhus, Hoegh-Guldbergsgade 10, Bldg 1090, 8000 Aarhus C, Denmark.
BMC Bioinformatics. 2007 Nov 9;8:434. doi: 10.1186/1471-2105-8-434.
Affymetrix SNP arrays can interrogate thousands of SNPs at the same time. This allows us to look at the genomic content of cancer cells and to investigate the underlying events leading to cancer. Genomic copy-numbers are today routinely derived from SNP array data, but the proposed algorithms for this task most often disregard the genotype information available from germline cells in paired germline-tumour samples. Including this information may deepen our understanding of the "true" biological situation e.g. by enabling analysis of allele specific copy-numbers. Here we rely on matched germline-tumour samples and have developed a Hidden Markov Model (HMM) to estimate allelic copy-number changes in tumour cells. Further with this approach we are able to estimate the proportion of normal cells in the tumour (mixture proportion).
We show that our method is able to recover the underlying copy-number changes in simulated data sets with high accuracy (above 97.71%). Moreover, although the known copy-numbers could be well recovered in simulated cancer samples with more than 70% cancer cells (and less than 30% normal cells), we demonstrate that including the mixture proportion in the HMM increases the accuracy of the method. Finally, the method is tested on HapMap samples and on bladder and prostate cancer samples.
The HMM method developed here uses the genotype calls of germline DNA and the allelic SNP intensities from the tumour DNA to estimate allelic copy-numbers (including changes) in the tumour. It differentiates between different events like uniparental disomy and allelic imbalances. Moreover, the HMM can estimate the mixture proportion, and thus inform about the purity of the tumour sample.
Affymetrix SNP 芯片可同时检测数千个单核苷酸多态性(SNP)。这使我们能够研究癌细胞的基因组内容,并探究导致癌症的潜在事件。如今,基因组拷贝数通常从 SNP 芯片数据中推导得出,但针对此任务所提出的算法大多忽略了配对的种系 - 肿瘤样本中种系细胞的基因型信息。纳入这些信息可能会加深我们对“真实”生物学状况的理解,例如通过实现对等位基因特异性拷贝数的分析。在此,我们依赖配对的种系 - 肿瘤样本,并开发了一种隐马尔可夫模型(HMM)来估计肿瘤细胞中的等位基因拷贝数变化。通过这种方法,我们还能够估计肿瘤中正常细胞的比例(混合比例)。
我们表明,我们的方法能够高精度地恢复模拟数据集中的潜在拷贝数变化(准确率高于 97.71%)。此外,尽管在癌细胞比例超过 70%(正常细胞比例低于 30%)的模拟癌症样本中能够很好地恢复已知拷贝数,但我们证明在 HMM 中纳入混合比例可提高该方法的准确性。最后,该方法在 HapMap 样本以及膀胱癌和前列腺癌样本上进行了测试。
这里开发的 HMM 方法利用种系 DNA 的基因型调用和肿瘤 DNA 的等位基因 SNP 强度来估计肿瘤中的等位基因拷贝数(包括变化)。它能够区分单亲二体和等位基因失衡等不同事件。此外,HMM 可以估计混合比例,从而告知肿瘤样本的纯度。