IEEE Trans Biomed Eng. 2018 Feb;65(2):353-364. doi: 10.1109/TBME.2017.2769677.
Copy-number variations (CNVs) are associated with complex diseases and particular tumor types. Array-based comparative genomic hybridization (aCGH) is a common approach for the detection of CNVs. Traditional CNV detection methods for multiple aCGH samples mainly use batch samples to find common variations, not accounting for the individual characteristics of each sample. Accurately differentiating both the commonly shared and the individual CNV patterns is pivotal to identify cell populations, or to distinguish cell growth (as in cancer) from invasion of new cells. Our preliminary results have now demonstrated that both the shared and individual CNV patterns have distinctive characteristics after wavelet transform.
To exploit these characteristics, we propose to formulate a quadratic data-separation problem within the wavelet space to discriminate the shared and individual CNVs from raw data. We have elaborated a numerical solution and shown that the solution can be obtained by solving decoupled subproblems. By this approach, computational costs can be limited, enabling efficient application in the analysis of large sequencing datasets.
The advantages of our proposed method, called WaveDec, have been demonstrated by comparison with popular CNV-detection methods using synthetic and empirical aCGH data. The performance of WaveDec was further validated by experiments with single-cell-sequencing data.
WaveDec can successfully differentiate shared and individual patterns, and performs well even in data contaminated with high levels of noise.
Both the shared and individual patterns can be uniquely characterized as well as effectively decomposed within the wavelet space.
拷贝数变异(CNVs)与复杂疾病和特定肿瘤类型有关。基于阵列的比较基因组杂交(aCGH)是检测 CNVs 的常用方法。用于多个 aCGH 样本的传统 CNV 检测方法主要使用批量样本来寻找常见的变异,而不考虑每个样本的个体特征。准确区分共同的和个体的 CNV 模式对于识别细胞群体至关重要,或者对于区分细胞生长(如癌症)与新细胞的入侵至关重要。我们的初步结果现在表明,小波变换后,共同和个体 CNV 模式都具有独特的特征。
为了利用这些特征,我们建议在小波空间中制定二次数据分离问题,以从原始数据中区分共同和个体 CNV。我们详细阐述了数值解,并表明可以通过求解解耦子问题来获得解。通过这种方法,可以限制计算成本,从而能够有效地应用于大型测序数据集的分析。
通过使用合成和经验 aCGH 数据与流行的 CNV 检测方法进行比较,证明了我们提出的方法(称为 WaveDec)的优势。通过单细胞测序数据的实验进一步验证了 WaveDec 的性能。
WaveDec 可以成功区分共同和个体模式,即使在受高水平噪声污染的数据中也能很好地执行。
共同和个体模式都可以在小波空间中独特地描述和有效地分解。