Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, Stanford, California, USA.
Department of Genetics, Stanford University School of Medicine, Stanford, California, USA.
J Med Genet. 2018 Nov;55(11):735-743. doi: 10.1136/jmedgenet-2018-105272. Epub 2018 Jul 30.
Copy number variation (CNV) analysis is an integral component of the study of human genomes in both research and clinical settings. Array-based CNV analysis is the current first-tier approach in clinical cytogenetics. Decreasing costs in high-throughput sequencing and cloud computing have opened doors for the development of sequencing-based CNV analysis pipelines with fast turnaround times. We carry out a systematic and quantitative comparative analysis for several low-coverage whole-genome sequencing (WGS) strategies to detect CNV in the human genome.
We compared the CNV detection capabilities of WGS strategies (short insert, 3 kb insert mate pair and 5 kb insert mate pair) each at 1×, 3× and 5× coverages relative to each other and to 17 currently used high-density oligonucleotide arrays. For benchmarking, we used a set of gold standard (GS) CNVs generated for the 1000 Genomes Project CEU subject NA12878.
Overall, low-coverage WGS strategies detect drastically more GS CNVs compared with arrays and are accompanied with smaller percentages of CNV calls without validation. Furthermore, we show that WGS (at ≥1× coverage) is able to detect all seven GS deletion CNVs >100 kb in NA12878, whereas only one is detected by most arrays. Lastly, we show that the much larger 15 Mbp Cri du chat deletion can be readily detected with short-insert paired-end WGS at even just 1× coverage.
CNV analysis using low-coverage WGS is efficient and outperforms the array-based analysis that is currently used for clinical cytogenetics.
拷贝数变异(CNV)分析是研究人类基因组在研究和临床环境中的一个组成部分。基于阵列的 CNV 分析是目前临床细胞遗传学的首要方法。高通量测序和云计算成本的降低为开发具有快速周转时间的基于测序的 CNV 分析管道打开了大门。我们对几种低覆盖率全基因组测序(WGS)策略进行了系统和定量的比较分析,以检测人类基因组中的 CNV。
我们比较了 WGS 策略(短插入、3 kb 插入配对和 5 kb 插入配对)在各自 1×、3×和 5×覆盖度下的 CNV 检测能力,以及与 17 种当前使用的高密度寡核苷酸阵列的比较。为了基准测试,我们使用了一组为 1000 基因组计划 CEU 个体 NA12878 生成的金标准(GS)CNV。
总体而言,与阵列相比,低覆盖率 WGS 策略检测到的 GS CNV 数量要多得多,并且没有经过验证的 CNV 调用比例更小。此外,我们表明 WGS(在≥1×覆盖度)能够检测到 NA12878 中所有七个 GS 删除 CNV >100 kb,而大多数阵列只能检测到一个。最后,我们表明,即使仅使用 1×覆盖度的短插入配对末端 WGS,也可以很容易地检测到更大的 15 Mbp Cri du chat 删除。
使用低覆盖率 WGS 的 CNV 分析高效,优于目前用于临床细胞遗传学的基于阵列的分析。