Department of Human Genetics, McGill University, Montréal H3A 1B1, Canada.
Canadian Center for Computational Genomics, Montréal H3A 1A4, Canada.
Nucleic Acids Res. 2018 Aug 21;46(14):7236-7249. doi: 10.1093/nar/gky538.
Copy number variants (CNVs) are known to affect a large portion of the human genome and have been implicated in many diseases. Although whole-genome sequencing (WGS) can help identify CNVs, most analytical methods suffer from limited sensitivity and specificity, especially in regions of low mappability. To address this, we use PopSV, a CNV caller that relies on multiple samples to control for technical variation. We demonstrate that our calls are stable across different types of repeat-rich regions and validate the accuracy of our predictions using orthogonal approaches. Applying PopSV to 640 human genomes, we find that low-mappability regions are approximately 5 times more likely to harbor germline CNVs, in stark contrast to the nearly uniform distribution observed for somatic CNVs in 95 cancer genomes. In addition to known enrichments in segmental duplication and near centromeres and telomeres, we also report that CNVs are enriched in specific types of satellite and in some of the most recent families of transposable elements. Finally, using this comprehensive approach, we identify 3455 regions with recurrent CNVs that were missing from existing catalogs. In particular, we identify 347 genes with a novel exonic CNV in low-mappability regions, including 29 genes previously associated with disease.
拷贝数变异 (CNVs) 已知会影响人类基因组的很大一部分,并与许多疾病有关。虽然全基因组测序 (WGS) 可以帮助识别 CNVs,但大多数分析方法的灵敏度和特异性有限,特别是在低可映射区域。为了解决这个问题,我们使用 PopSV,这是一种依赖于多个样本来控制技术变异的 CNV 调用器。我们证明了我们的调用在不同类型的重复丰富区域是稳定的,并使用正交方法验证了我们预测的准确性。将 PopSV 应用于 640 个人类基因组,我们发现低可映射区域大约有 5 倍的可能性携带种系 CNVs,与在 95 个癌症基因组中观察到的体细胞 CNVs 几乎均匀分布形成鲜明对比。除了在片段重复和近着丝粒和端粒处已知的富集外,我们还报告 CNVs 在特定类型的卫星和一些最新的转座元件家族中富集。最后,使用这种全面的方法,我们确定了 3455 个具有反复出现的 CNVs 的区域,这些区域在现有目录中缺失。特别是,我们在低可映射区域鉴定了 347 个具有新型外显子 CNV 的基因,其中包括 29 个先前与疾病相关的基因。