Department of Preventive Medicine, Zilkha Neurogenetic Institute and Department of Psychiatry, University of Southern California, Los Angeles, CA 90089, USA.
Bioinformatics. 2013 Dec 1;29(23):2964-70. doi: 10.1093/bioinformatics/btt521. Epub 2013 Sep 9.
The accurate detection of copy number alterations (CNAs) in human genomes is important for understanding susceptibility to cancer and mechanisms of tumor progression. CNA detection in tumors from single nucleotide polymorphism (SNP) genotyping arrays is a challenging problem due to phenomena such as aneuploidy, stromal contamination, genomic waves and intra-tumor heterogeneity, issues that leading methods do not optimally address.
Here we introduce methods and software (PennCNV-tumor) for fast and accurate CNA detection using signal intensity data from SNP genotyping arrays. We estimate stromal contamination by applying a maximum likelihood approach over multiple discrete genomic intervals. By conditioning on signal intensity across the genome, our method accounts for both aneuploidy and genomic waves. Finally, our method uses a hidden Markov model to integrate multiple sources of information, including total and allele-specific signal intensity at each SNP, as well as physical maps to make posterior inferences of CNAs. Using real data from cancer cell-lines and patient tumors, we demonstrate substantial improvements in accuracy and computational efficiency compared with existing methods.
准确检测人类基因组中的拷贝数改变(CNAs)对于理解癌症易感性和肿瘤进展机制非常重要。由于非整倍体、基质污染、基因组波和肿瘤内异质性等现象,单核苷酸多态性(SNP)基因分型阵列中的 CNA 检测是一个具有挑战性的问题,而主要方法并不能很好地解决这些问题。
在这里,我们介绍了使用 SNP 基因分型阵列的信号强度数据进行快速准确的 CNA 检测的方法和软件(PennCNV-tumor)。我们通过在多个离散基因组间隔上应用最大似然方法来估计基质污染。通过对整个基因组的信号强度进行条件处理,我们的方法既考虑了非整倍体,也考虑了基因组波。最后,我们的方法使用隐马尔可夫模型来整合多个信息源,包括每个 SNP 的总信号强度和等位基因特异性信号强度,以及物理图谱,以对 CNAs 进行后验推断。使用来自癌细胞系和患者肿瘤的真实数据,我们证明与现有方法相比,在准确性和计算效率方面有了显著提高。