Alshawaqfeh Mustafa, Al Kawam Ahmad, Serpedin Erchin, Datta Aniruddha
IEEE/ACM Trans Comput Biol Bioinform. 2020 May-Jun;17(3):1056-1067. doi: 10.1109/TCBB.2018.2878560. Epub 2018 Oct 30.
The study of recurrent copy number variations (CNVs) plays an important role in understanding the onset and evolution of complex diseases such as cancer. Array-based comparative genomic hybridization (aCGH) is a widely used microarray based technology for identifying CNVs. However, due to high noise levels and inter-sample variability, detecting recurrent CNVs from aCGH data remains a challenging topic. This paper proposes a novel method for identification of the recurrent CNVs. In the proposed method, the noisy aCGH data is modeled as the superposition of three matrices: a full-rank matrix of weighted piece-wise generating signals accounting for the clean aCGH data, a Gaussian noise matrix to model the inherent experimentation errors and other sources of error, and a sparse matrix to capture the sparse inter-sample (sample-specific) variations. We demonstrated the ability of our method to separate accurately recurrent CNVs from sample-specific variations and noise in both simulated (artificial) data and real data. The proposed method produced more accurate results than current state-of-the-art methods used in recurrent CNV detection and exhibited robustness to noise and sample-specific variations.
对复发性拷贝数变异(CNV)的研究在理解诸如癌症等复杂疾病的发病和演变过程中发挥着重要作用。基于微阵列的比较基因组杂交技术(aCGH)是一种广泛应用的基于微阵列的用于识别CNV的技术。然而,由于噪声水平高和样本间变异性大,从aCGH数据中检测复发性CNV仍然是一个具有挑战性的课题。本文提出了一种识别复发性CNV的新方法。在所提出的方法中,有噪声的aCGH数据被建模为三个矩阵的叠加:一个加权分段生成信号的满秩矩阵,用于表示干净的aCGH数据;一个高斯噪声矩阵,用于模拟固有的实验误差和其他误差来源;一个稀疏矩阵,用于捕捉稀疏的样本间(样本特异性)变异。我们证明了我们的方法能够在模拟(人工)数据和真实数据中准确地将复发性CNV与样本特异性变异和噪声区分开来。与目前用于复发性CNV检测的最先进方法相比,所提出的方法产生了更准确的结果,并且对噪声和样本特异性变异具有鲁棒性。