Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195, USA.
Genome Res. 2012 Aug;22(8):1525-32. doi: 10.1101/gr.138115.112. Epub 2012 May 14.
While exome sequencing is readily amenable to single-nucleotide variant discovery, the sparse and nonuniform nature of the exome capture reaction has hindered exome-based detection and characterization of genic copy number variation. We developed a novel method using singular value decomposition (SVD) normalization to discover rare genic copy number variants (CNVs) as well as genotype copy number polymorphic (CNP) loci with high sensitivity and specificity from exome sequencing data. We estimate the precision of our algorithm using 122 trios (366 exomes) and show that this method can be used to reliably predict (94% overall precision) both de novo and inherited rare CNVs involving three or more consecutive exons. We demonstrate that exome-based genotyping of CNPs strongly correlates with whole-genome data (median r(2) = 0.91), especially for loci with fewer than eight copies, and can estimate the absolute copy number of multi-allelic genes with high accuracy (78% call level). The resulting user-friendly computational pipeline, CoNIFER (copy number inference from exome reads), can reliably be used to discover disruptive genic CNVs missed by standard approaches and should have broad application in human genetic studies of disease.
虽然外显子组测序非常适合单核苷酸变异的发现,但外显子捕获反应的稀疏性和非均匀性阻碍了基于外显子的基因拷贝数变异的检测和特征描述。我们开发了一种使用奇异值分解(SVD)归一化的新方法,从外显子组测序数据中以高灵敏度和特异性发现罕见的基因拷贝数变异(CNV)和基因型拷贝数多态性(CNP)位点。我们使用 122 个三核苷酸重复(366 个外显子)来估计我们算法的精度,并表明该方法可用于可靠地预测(总体精度为 94%)涉及三个或更多连续外显子的新发和遗传罕见 CNV。我们证明,基于外显子的 CNP 基因分型与全基因组数据具有很强的相关性(中位数 r(2) = 0.91),尤其是对于少于 8 个拷贝的位点,并且可以高精度(78%的调用水平)估计多等位基因基因的绝对拷贝数。由此产生的用户友好型计算管道 CoNIFER(外显子读数的拷贝数推断)可可靠地用于发现标准方法错过的破坏性基因 CNV,并且应该在疾病的人类遗传研究中具有广泛的应用。