Yu Tianwei, Ye Hui, Sun Wei, Li Ker-Chau, Chen Zugen, Jacobs Sharoni, Bailey Dione K, Wong David T, Zhou Xiaofeng
Department of Biostatistics, Rollins School of Public Health, Emory University, Atlanta, GA, USA.
BMC Bioinformatics. 2007 May 3;8:145. doi: 10.1186/1471-2105-8-145.
DNA copy number aberration (CNA) is one of the key characteristics of cancer cells. Recent studies demonstrated the feasibility of utilizing high density single nucleotide polymorphism (SNP) genotyping arrays to detect CNA. Compared with the two-color array-based comparative genomic hybridization (array-CGH), the SNP arrays offer much higher probe density and lower signal-to-noise ratio at the single SNP level. To accurately identify small segments of CNA from SNP array data, segmentation methods that are sensitive to CNA while resistant to noise are required.
We have developed a highly sensitive algorithm for the edge detection of copy number data which is especially suitable for the SNP array-based copy number data. The method consists of an over-sensitive edge-detection step and a test-based forward-backward edge selection step.
Using simulations constructed from real experimental data, the method shows high sensitivity and specificity in detecting small copy number changes in focused regions. The method is implemented in an R package FASeg, which includes data processing and visualization utilities, as well as libraries for processing Affymetrix SNP array data.
DNA拷贝数畸变(CNA)是癌细胞的关键特征之一。最近的研究证明了利用高密度单核苷酸多态性(SNP)基因分型阵列检测CNA的可行性。与基于双色阵列的比较基因组杂交(array-CGH)相比,SNP阵列在单个SNP水平上提供了更高的探针密度和更低的信噪比。为了从SNP阵列数据中准确识别CNA的小片段,需要对CNA敏感且抗噪声的分割方法。
我们开发了一种高度敏感的算法,用于拷贝数数据的边缘检测,特别适用于基于SNP阵列的拷贝数数据。该方法包括一个过度敏感的边缘检测步骤和一个基于测试的前后边缘选择步骤。
使用从真实实验数据构建的模拟,该方法在检测聚焦区域的小拷贝数变化方面显示出高灵敏度和特异性。该方法在R包FASeg中实现,该包包括数据处理和可视化实用程序,以及用于处理Affymetrix SNP阵列数据的库。