Huang Heng, Nguyen Nha, Oraintara Soontorn, Vo An
Department of Computer Science and Engineering, University of Texas at Arlington, TX, USA.
BMC Genomics. 2008 Sep 16;9 Suppl 2(Suppl 2):S17. doi: 10.1186/1471-2164-9-S2-S17.
Array-based comparative genomic hybridization (array CGH) is a highly efficient technique, allowing the simultaneous measurement of genomic DNA copy number at hundreds or thousands of loci and the reliable detection of local one-copy-level variations. Characterization of these DNA copy number changes is important for both the basic understanding of cancer and its diagnosis. In order to develop effective methods to identify aberration regions from array CGH data, many recent research work focus on both smoothing-based and segmentation-based data processing. In this paper, we propose stationary packet wavelet transform based approach to smooth array CGH data. Our purpose is to remove CGH noise in whole frequency while keeping true signal by using bivariate model.
In both synthetic and real CGH data, Stationary Wavelet Packet Transform (SWPT) is the best wavelet transform to analyze CGH signal in whole frequency. We also introduce a new bivariate shrinkage model which shows the relationship of CGH noisy coefficients of two scales in SWPT. Before smoothing, the symmetric extension is considered as a preprocessing step to save information at the border.
We have designed the SWTP and the SWPT-Bi which are using the stationary wavelet packet transform with the hard thresholding and the new bivariate shrinkage estimator respectively to smooth the array CGH data. We demonstrate the effectiveness of our approach through theoretical and experimental exploration of a set of array CGH data, including both synthetic data and real data. The comparison results show that our method outperforms the previous approaches.
基于芯片的比较基因组杂交技术(芯片比较基因组杂交,array CGH)是一种高效技术,能够同时测量数百或数千个基因座处的基因组DNA拷贝数,并可靠地检测局部单拷贝水平的变异。表征这些DNA拷贝数变化对于癌症的基础理解及其诊断都很重要。为了开发从芯片比较基因组杂交数据中识别畸变区域的有效方法,许多近期研究工作聚焦于基于平滑和基于分割的数据处理。在本文中,我们提出基于平稳小波包变换的方法来平滑芯片比较基因组杂交数据。我们的目的是通过使用双变量模型在去除全频段芯片比较基因组杂交噪声的同时保留真实信号。
在合成和真实的芯片比较基因组杂交数据中,平稳小波包变换(SWPT)是在全频段分析芯片比较基因组杂交信号的最佳小波变换。我们还引入了一种新的双变量收缩模型,该模型显示了平稳小波包变换中两个尺度的芯片比较基因组杂交噪声系数之间的关系。在平滑之前,对称延拓被视为一种预处理步骤,以在边界处保存信息。
我们分别设计了SWTP和SWPT-Bi,它们分别使用带有硬阈值处理的平稳小波包变换和新的双变量收缩估计器来平滑芯片比较基因组杂交数据。我们通过对一组芯片比较基因组杂交数据(包括合成数据和真实数据)进行理论和实验探索,证明了我们方法的有效性。比较结果表明,我们的方法优于先前的方法。