Budinska Eva, Gelnarova Eva, Schimek Michael G
Institute of Biostatistics and Analyses, Masaryk University, Kamenice 126/3, 625 00 Brno, Czech Republic.
Bioinformatics. 2009 Mar 15;25(6):703-13. doi: 10.1093/bioinformatics/btp022. Epub 2009 Jan 15.
Genome analysis has become one of the most important tools for understanding the complex process of cancerogenesis. With increasing resolution of CGH arrays, the demand for computationally efficient algorithms arises, which are effective in the detection of aberrations even in very noisy data.
We developed a rather simple, non-parametric technique of high computational efficiency for CGH array analysis that adopts a median absolute deviation concept for breakpoint detection, comprising median smoothing for pre-processing. The resulting algorithm has the potential to outperform any single smoothing approach as well as several recently proposed segmentation techniques. We show its performance through the application of simulated and real datasets in comparison to three other methods for array CGH analysis.
Our approach is implemented in the R-language and environment for statistical computing (version 2.6.1 for Windows, R-project, 2007). The code is available at: http://www.iba.muni.cz/~budinska/msmad.html.
Supplementary data are available at Bioinformatics online.
基因组分析已成为理解癌症发生复杂过程的最重要工具之一。随着比较基因组杂交(CGH)阵列分辨率的提高,对计算效率高的算法的需求应运而生,这些算法即使在非常嘈杂的数据中也能有效地检测出畸变。
我们开发了一种用于CGH阵列分析的相当简单、计算效率高的非参数技术,该技术采用中位数绝对偏差概念进行断点检测,包括用于预处理的中位数平滑。所得算法有可能优于任何单一的平滑方法以及最近提出的几种分割技术。通过将模拟数据集和真实数据集应用于阵列CGH分析,并与其他三种方法进行比较,我们展示了其性能。
我们的方法是在R语言和统计计算环境(Windows版2.6.1,R项目,2007)中实现的。代码可在以下网址获取:http://www.iba.muni.cz/~budinska/msmad.html。
补充数据可在《生物信息学》在线获取。