BGI-Shenzhen, Shenzhen, China.
BGI-Shenzhen, Shenzhen, China ; State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, China.
PLoS One. 2014 Jan 21;9(1):e85096. doi: 10.1371/journal.pone.0085096. eCollection 2014.
Copy number variations (CNVs) represent an important type of genetic variation that deeply impact phenotypic polymorphisms and human diseases. The advent of high-throughput sequencing technologies provides an opportunity to revolutionize the discovery of CNVs and to explore their relationship with diseases. However, most of the existing methods depend on sequencing depth and show instability with low sequence coverage. In this study, using low coverage whole-genome sequencing (LCS) we have developed an effective population-scale CNV calling (PSCC) method.
METHODOLOGY/PRINCIPAL FINDINGS: In our novel method, two-step correction was used to remove biases caused by local GC content and complex genomic characteristics. We chose a binary segmentation method to locate CNV segments and designed combined statistics tests to ensure the stable performance of the false positive control. The simulation data showed that our PSCC method could achieve 99.7%/100% and 98.6%/100% sensitivity and specificity for over 300 kb CNV calling in the condition of LCS (∼2×) and ultra LCS (∼0.2×), respectively. Finally, we applied this novel method to analyze 34 clinical samples with an average of 2× LCS. In the final results, all the 31 pathogenic CNVs identified by aCGH were successfully detected. In addition, the performance comparison revealed that our method had significant advantages over existing methods using ultra LCS.
CONCLUSIONS/SIGNIFICANCE: Our study showed that PSCC can sensitively and reliably detect CNVs using low coverage or even ultra-low coverage data through population-scale sequencing.
拷贝数变异 (CNV) 是一种重要的遗传变异类型,对表型多态性和人类疾病有深远影响。高通量测序技术的出现为发现 CNV 并探索其与疾病的关系提供了机会。然而,大多数现有的方法依赖于测序深度,在低测序覆盖度下表现不稳定。在这项研究中,我们使用低覆盖度全基因组测序 (LCS) 开发了一种有效的群体规模 CNV 调用 (PSCC) 方法。
方法/主要发现:在我们的新方法中,两步校正用于消除由局部 GC 含量和复杂基因组特征引起的偏差。我们选择了一种二值分割方法来定位 CNV 片段,并设计了组合统计测试来确保假阳性对照的稳定性能。模拟数据表明,我们的 PSCC 方法在 LCS(2×)和超低 LCS(0.2×)条件下,分别可以实现 99.7%/100%和 98.6%/100%的超过 300 kb CNV 调用的灵敏度和特异性。最后,我们将这种新方法应用于分析 34 个平均 LCS 为 2×的临床样本。在最终结果中,成功检测到 aCGH 鉴定的 31 个致病性 CNV。此外,性能比较表明,我们的方法在使用超低 LCS 时具有显著优势。
结论/意义:我们的研究表明,PSCC 可以通过群体规模测序,在低覆盖度甚至超低覆盖度数据下,灵敏可靠地检测 CNV。