Computer Science and Engineering Department, University of Connecticut, Storrs, CT, USA.
Computer Science and Engineering Department and Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA.
BMC Bioinformatics. 2018 Oct 22;19(Suppl 11):361. doi: 10.1186/s12859-018-2332-x.
Due to recent advances in sequencing technologies, sequence-based analysis has been widely applied to detecting copy number variations (CNVs). There are several techniques for identifying CNVs using next generation sequencing (NGS) data, however methods employing depth of coverage or read depth (RD) have recently become a main technique to identify CNVs. The main assumption of the RD-based CNV detection methods is that the readcount value at a specific genomic location is correlated with the copy number at that location. However, readcount data's noise and biases distort the association between the readcounts and copy numbers. For more accurate CNV identification, these biases and noise need to be mitigated. In this work, to detect CNVs more precisely and efficiently we propose a novel denoising method based on the total variation approach and the Taut String algorithm.
To investigate the performance of the proposed denoising method, we computed sensitivities, false discovery rates and specificities of CNV detection when employing denoising, using both simulated and real data. We also compared the performance of the proposed denoising method, Taut String, with that of the commonly used approaches such as moving average (MA) and discrete wavelet transforms (DWT) in terms of sensitivity of detecting true CNVs and time complexity. The results show that Taut String works better than DWT and MA and has a better power to identify very narrow CNVs. The ability of Taut String denoising in preserving CNV segments' breakpoints and narrow CNVs increases the detection accuracy of segmentation algorithms, resulting in higher sensitivities and lower false discovery rates.
In this study, we proposed a new denoising method for sequence-based CNV detection based on a signal processing technique. Existing CNV detection algorithms identify many false CNV segments and fail in detecting short CNV segments due to noise and biases. Employing an effective and efficient denoising method can significantly enhance the detection accuracy of the CNV segmentation algorithms. Advanced denoising methods from the signal processing field can be employed to implement such algorithms. We showed that non-linear denoising methods that consider sparsity and piecewise constant characteristics of CNV data result in better performance in CNV detection.
由于测序技术的最新进展,基于序列的分析已广泛应用于检测拷贝数变异(CNV)。有几种技术可用于使用下一代测序(NGS)数据识别 CNV,但是最近,使用深度覆盖或读取深度(RD)的方法已成为识别 CNV 的主要技术。基于 RD 的 CNV 检测方法的主要假设是,特定基因组位置的读取计数值与该位置的拷贝数相关。但是,读取计数数据的噪声和偏差会扭曲读取计数与拷贝数之间的关联。为了更准确地识别 CNV,需要减轻这些偏差和噪声。在这项工作中,为了更精确和高效地检测 CNV,我们提出了一种基于全变差方法和 Taut String 算法的新型去噪方法。
为了研究所提出的去噪方法的性能,我们使用模拟和真实数据计算了去噪时的 CNV 检测灵敏度、假发现率和特异性。我们还比较了所提出的去噪方法(Taut String)与常用方法(如移动平均(MA)和离散小波变换(DWT))的性能,这些方法在检测真实 CNV 的灵敏度和时间复杂度方面进行了比较。结果表明,Taut String 比 DWT 和 MA 效果更好,并且具有更好的识别非常窄的 CNV 的能力。Taut String 去噪在保留 CNV 片段断点和窄 CNV 的能力提高了分割算法的检测准确性,从而提高了灵敏度和降低了假发现率。
在这项研究中,我们提出了一种基于信号处理技术的新的基于序列的 CNV 检测去噪方法。现有的 CNV 检测算法由于噪声和偏差会识别出许多假的 CNV 片段,并无法检测到短的 CNV 片段。采用有效的和高效的去噪方法可以显著提高 CNV 分割算法的检测准确性。可以采用信号处理领域的高级去噪方法来实现此类算法。我们表明,考虑到 CNV 数据的稀疏性和分段常数特征的非线性去噪方法在 CNV 检测中具有更好的性能。