Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX 77230, USA and Department of Biophysics, College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang 150081, China.
Bioinformatics. 2013 Nov 1;29(21):2678-82. doi: 10.1093/bioinformatics/btt479. Epub 2013 Sep 16.
Data quality is a critical issue in the analyses of DNA copy number alterations obtained from microarrays. It is commonly assumed that copy number alteration data can be modeled as piecewise constant and the measurement errors of different probes are independent. However, these assumptions do not always hold in practice. In some published datasets, we find that measurement errors are highly correlated between probes that interrogate nearby genomic loci, and the piecewise-constant model does not fit the data well. The correlated errors cause problems in downstream analysis, leading to a large number of DNA segments falsely identified as having copy number gains and losses.
We developed a simple tool, called autocorrelation scanning profile, to assess the dependence of measurement error between neighboring probes.
Autocorrelation scanning profile can be used to check data quality and refine the analysis of DNA copy number data, which we demonstrate in some typical datasets.
Supplementary data are available at Bioinformatics online.
从微阵列获得的 DNA 拷贝数改变分析中,数据质量是一个关键问题。通常假设拷贝数改变数据可以建模为分段常数,并且不同探针的测量误差是独立的。然而,这些假设在实践中并不总是成立。在一些已发表的数据集,我们发现探测附近基因组区域的探针之间的测量误差高度相关,而分段常数模型不能很好地拟合数据。相关的误差会在下游分析中引起问题,导致大量的 DNA 片段被错误地识别为具有拷贝数增益和缺失。
我们开发了一种简单的工具,称为自相关扫描谱,用于评估相邻探针之间测量误差的相关性。
自相关扫描谱可用于检查数据质量并改进 DNA 拷贝数数据的分析,我们在一些典型的数据集上进行了演示。
补充数据可在生物信息学在线获得。