Yoon Seungtai, Xuan Zhenyu, Makarov Vladimir, Ye Kenny, Sebat Jonathan
Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA.
Genome Res. 2009 Sep;19(9):1586-92. doi: 10.1101/gr.092981.109. Epub 2009 Aug 5.
Methods for the direct detection of copy number variation (CNV) genome-wide have become effective instruments for identifying genetic risk factors for disease. The application of next-generation sequencing platforms to genetic studies promises to improve sensitivity to detect CNVs as well as inversions, indels, and SNPs. New computational approaches are needed to systematically detect these variants from genome sequence data. Existing sequence-based approaches for CNV detection are primarily based on paired-end read mapping (PEM) as reported previously by Tuzun et al. and Korbel et al. Due to limitations of the PEM approach, some classes of CNVs are difficult to ascertain, including large insertions and variants located within complex genomic regions. To overcome these limitations, we developed a method for CNV detection using read depth of coverage. Event-wise testing (EWT) is a method based on significance testing. In contrast to standard segmentation algorithms that typically operate by performing likelihood evaluation for every point in the genome, EWT works on intervals of data points, rapidly searching for specific classes of events. Overall false-positive rate is controlled by testing the significance of each possible event and adjusting for multiple testing. Deletions and duplications detected in an individual genome by EWT are examined across multiple genomes to identify polymorphism between individuals. We estimated error rates using simulations based on real data, and we applied EWT to the analysis of chromosome 1 from paired-end shotgun sequence data (30x) on five individuals. Our results suggest that analysis of read depth is an effective approach for the detection of CNVs, and it captures structural variants that are refractory to established PEM-based methods.
全基因组范围内直接检测拷贝数变异(CNV)的方法已成为识别疾病遗传风险因素的有效工具。将下一代测序平台应用于基因研究有望提高检测CNV以及倒位、插入缺失和单核苷酸多态性(SNP)的灵敏度。需要新的计算方法来从基因组序列数据中系统地检测这些变异。现有的基于序列的CNV检测方法主要基于Tuzun等人和Korbel等人之前报道的双末端读段映射(PEM)。由于PEM方法的局限性,某些类型的CNV难以确定,包括大的插入和位于复杂基因组区域内的变异。为了克服这些局限性,我们开发了一种利用覆盖深度进行CNV检测的方法。逐事件检验(EWT)是一种基于显著性检验的方法。与通常通过对基因组中的每个点进行似然评估来运行的标准分割算法不同,EWT在数据点的区间上运行,快速搜索特定类型的事件。通过检验每个可能事件的显著性并针对多重检验进行校正来控制总体假阳性率。通过EWT在单个基因组中检测到的缺失和重复在多个基因组中进行检查,以识别个体间的多态性。我们使用基于真实数据的模拟估计错误率,并将EWT应用于对五个个体的双末端鸟枪法序列数据(30倍覆盖)中的1号染色体进行分析。我们的结果表明,覆盖深度分析是检测CNV的有效方法,并且它能够捕获现有基于PEM的方法难以检测的结构变异。