Ontario Institute for Cancer Research, Toronto, ON, M5G 0A3, Canada and Department of Medical Biophysics, University of Toronto, Toronto, ON, M5G 2M9, Canada.
Bioinformatics. 2014 Mar 15;30(6):768-74. doi: 10.1093/bioinformatics/btt611. Epub 2013 Nov 4.
Copy number variations (CNVs) are a major source of genomic variability and are especially significant in cancer. Until recently microarray technologies have been used to characterize CNVs in genomes. However, advances in next-generation sequencing technology offer significant opportunities to deduce copy number directly from genome sequencing data. Unfortunately cancer genomes differ from normal genomes in several aspects that make them far less amenable to copy number detection. For example, cancer genomes are often aneuploid and an admixture of diploid/non-tumor cell fractions. Also patient-derived xenograft models can be laden with mouse contamination that strongly affects accurate assignment of copy number. Hence, there is a need to develop analytical tools that can take into account cancer-specific parameters for detecting CNVs directly from genome sequencing data.
We have developed WaveCNV, a software package to identify copy number alterations by detecting breakpoints of CNVs using translation-invariant discrete wavelet transforms and assign digitized copy numbers to each event using next-generation sequencing data. We also assign alleles specifying the chromosomal ratio following duplication/loss. We verified copy number calls using both microarray (correlation coefficient 0.97) and quantitative polymerase chain reaction (correlation coefficient 0.94) and found them to be highly concordant. We demonstrate its utility in pancreatic primary and xenograft sequencing data.
Source code and executables are available at https://github.com/WaveCNV. The segmentation algorithm is implemented in MATLAB, and copy number assignment is implemented Perl.
Supplementary data are available at Bioinformatics online.
拷贝数变异 (CNV) 是基因组变异性的主要来源,在癌症中尤为重要。直到最近,微阵列技术一直被用于描述基因组中的 CNV。然而,下一代测序技术的进步为直接从基因组测序数据推断拷贝数提供了重要机会。不幸的是,癌症基因组在几个方面与正常基因组不同,这使得它们远不如拷贝数检测容易。例如,癌症基因组通常是非整倍体,并且是二倍体/非肿瘤细胞分数的混合物。此外,患者衍生的异种移植模型可能会受到强烈影响拷贝数准确分配的小鼠污染。因此,需要开发分析工具,这些工具可以考虑癌症特有的参数,以便直接从基因组测序数据中检测 CNV。
我们开发了 WaveCNV,这是一个软件包,通过使用平移不变离散小波变换检测 CNV 的断点来识别拷贝数改变,并使用下一代测序数据为每个事件分配数字化拷贝数。我们还分配了指定染色体比例的等位基因,用于复制/丢失后。我们使用微阵列(相关系数 0.97)和定量聚合酶链反应(相关系数 0.94)验证了拷贝数调用,发现它们高度一致。我们在胰腺原代和异种移植测序数据中证明了其效用。
源代码和可执行文件可在 https://github.com/WaveCNV 获得。分段算法在 MATLAB 中实现,拷贝数分配在 Perl 中实现。
补充数据可在 Bioinformatics 在线获得。