Department of Biomedical Engineering, Tulane University, New Orleans, LA, USA.
BMC Bioinformatics. 2013 May 2;14:150. doi: 10.1186/1471-2105-14-150.
Copy number variation (CNV) is an important structural variation (SV) in human genome. Various studies have shown that CNVs are associated with complex diseases. Traditional CNV detection methods such as fluorescence in situ hybridization (FISH) and array comparative genomic hybridization (aCGH) suffer from low resolution. The next generation sequencing (NGS) technique promises a higher resolution detection of CNVs and several methods were recently proposed for realizing such a promise. However, the performances of these methods are not robust under some conditions, e.g., some of them may fail to detect CNVs of short sizes. There has been a strong demand for reliable detection of CNVs from high resolution NGS data.
A novel and robust method to detect CNV from short sequencing reads is proposed in this study. The detection of CNV is modeled as a change-point detection from the read depth (RD) signal derived from the NGS, which is fitted with a total variation (TV) penalized least squares model. The performance (e.g., sensitivity and specificity) of the proposed approach are evaluated by comparison with several recently published methods on both simulated and real data from the 1000 Genomes Project.
The experimental results showed that both the true positive rate and false positive rate of the proposed detection method do not change significantly for CNVs with different copy numbers and lengthes, when compared with several existing methods. Therefore, our proposed approach results in a more reliable detection of CNVs than the existing methods.
拷贝数变异(CNV)是人类基因组中的一种重要结构变异(SV)。多项研究表明,CNVs 与复杂疾病有关。荧光原位杂交(FISH)和阵列比较基因组杂交(aCGH)等传统的 CNV 检测方法分辨率较低。下一代测序(NGS)技术有望实现更高分辨率的 CNV 检测,最近提出了几种方法来实现这一目标。然而,在某些条件下,这些方法的性能并不稳健,例如,它们中的一些可能无法检测到短大小的 CNV。人们强烈需要从高分辨率 NGS 数据中可靠地检测 CNV。
本研究提出了一种从短测序读段中检测 CNV 的新颖而稳健的方法。将 CNV 的检测建模为从 NGS 衍生的读深度(RD)信号中进行的变点检测,该信号通过总变差(TV)惩罚最小二乘模型进行拟合。通过与来自 1000 基因组计划的模拟和真实数据的几个最近发表的方法进行比较,评估了所提出方法的性能(例如,灵敏度和特异性)。
实验结果表明,与几种现有方法相比,当比较具有不同拷贝数和长度的 CNV 时,所提出的检测方法的真阳性率和假阳性率都没有显著变化。因此,与现有方法相比,我们提出的方法可以更可靠地检测 CNVs。