Ben-Yaacov Erez, Eldar Yonina C
Department of Electrical Engineering, Technion-Israel Institute of Technology, Haifa Israel.
Bioinformatics. 2008 Aug 15;24(16):i139-45. doi: 10.1093/bioinformatics/btn272.
Array Comparative Genomic Hybridization (aCGH) is used to scan the entire genome for variations in DNA copy number. A central task in the analysis of aCGH data is the segmentation into groups of probes sharing the same DNA copy number. Some well known segmentation methods suffer from very long running times, preventing interactive data analysis.
We suggest a new segmentation method based on wavelet decomposition and thresholding, which detects significant breakpoints in the data. Our algorithm is over 1000 times faster than leading approaches, with similar performance. Another key advantage of the proposed method is its simplicity and flexibility. Due to its intuitive structure, it can be easily generalized to incorporate several types of side information. Here, we consider two extensions which include side information indicating the reliability of each measurement, and compensating for a changing variability in the measurement noise. The resulting algorithm outperforms existing methods, both in terms of speed and performance, when applied on real high density CGH data.
Implementation is available under software tab at: http://www.ee.technion.ac.il/Sites/People/YoninaEldar/.
阵列比较基因组杂交(aCGH)用于扫描整个基因组以检测DNA拷贝数的变化。aCGH数据分析中的一项核心任务是将具有相同DNA拷贝数的探针组进行分段。一些知名的分段方法运行时间极长,阻碍了交互式数据分析。
我们提出了一种基于小波分解和阈值处理的新分段方法,该方法可检测数据中的显著断点。我们的算法比领先方法快1000多倍,且性能相似。该方法的另一个关键优势是其简单性和灵活性。由于其直观的结构,它可以很容易地进行扩展以纳入多种类型的辅助信息。在此,我们考虑两种扩展,一种是包含表示每次测量可靠性的辅助信息,另一种是补偿测量噪声中不断变化的可变性。当应用于实际的高密度CGH数据时,所得算法在速度和性能方面均优于现有方法。
可在以下网址的软件标签下获取实现代码:http://www.ee.technion.ac.il/Sites/People/YoninaEldar/ 。